Graphics development is currently undergoing a significant paradigm shift away from a fixed function rendering approach where triangles are sent to the graphics card for rendering, toward a shader based rendering approach. This document is intended to describe the shader-based approach and outline the benefits of the shader-based approach.
Since the introduction of GeForce 3, every generation of graphics chip sets has introduced more and more powerful concepts to allow for programmability at the vertices or pixels level. In the first generation, this was limited to approximately 20 commands, in a primitive GPU assembly language, that were called "register combiners." This quickly gave rise to higher-level programming languages such as HLSL, GLSL and Cg. Shader-based graphics has taken over to such an extent that even the fixed function APIs are now emulated by hardware vendors using shaders.
The usual way to use shaders is to load a vertex and pixel program as a matched pair onto the graphics card (a.k.a. GPU). Each stage of processing, from the CPU to the vertex shader to the pixel shader, is interconnected, because the outputs from one stage become the inputs for the next. Modern graphics cards have shader processing units that can be dynamically tasked to run either vertex shaders or pixel shaders. These are similar to CPU cores, except that instead of the four cores you might find on the latest x86 from Intel or AMD, you can have as many as one thousand on a single graphics card.
HLSL (and its close relative, GLSL, for OpenGL) is the language we at Tech Soft 3D have chosen to use. It is very similar to C, with only a few minor differences. For example, Vectors and Matrices are native types in the language, but there are no integers. The most restrictive limitations are that no data from neighboring vertices or pixels is accessible . This restriction exists because vertices and fragments need to preserve the ability to execute out of order in order to prevent synchronization bottlenecks.
In the last two generations, in addition to including new instructions and relaxing restrictions, hardware vendors have added entirely new stages into the pipeline. Since our dx9 driver is based on DirectX 9, we have not targeted these yet, but plan to do so aggressively with our new dx11 driver.
As outlined below, there are several compelling arguments in favor of moving away from fixed function graphics in favor of shaders. Fixed function is a concept that made sense in the days when you could fit 100,000 transistors on a chip. The latest chip sets have 1.3 billion transistors, and fixed function is mapping less and less well to in the new hardware configuration.
The shaders that Tech Soft 3D has written start with a single file that combines all possible features that can be accessed, and strips out the ones that are not needed. The shaders are compiled upon first use and then cached for quick access later. This approach was based on the observation that although the combinations of features are effectively infinite, the number that are enabled in any specific runtime are actually quite small. Thus, if you have a triangle with one texture, two lights and a shadow map, we will use shaders that have exactly those features and nothing else, saving the cycles that would normally be wasted with other implementations.
The biggest performance benefits of shaders come where we are able to take a step back and look at different algorithms for solving problems. Take, for example, the problem of drawing a filled circle. The first step is to run the center and radius through a transform to determine how much of the screen it occupies. Next, you need to produce as many (or close to as many) triangles as you have pixels in the outline. The triangles you need will be view-dependent, so they can't be cached in the GPU's memory, making things even worse. Using shaders, however, you simply draw a view-independent square and load a program that uses the condition x² + y² < r² to determine which pixels will be drawn. That's it!
There are many more examples where view-dependent calculations that once needed CPU involvement can stay native on the GPU, such as environment mapping in orthographic projections, placement of vertices for wide lines and flipping normals to face the camera in the cases where we do not have a firm definition of inside vs. outside on surfaces . All of these benefit from the superior parallelism of GPUs, which in the latest generation can have over a thousand processor cores.
Fixed function hardware is long gone. It has been replaced by the hardware vendors with emulation based on shaders. This emulation layer has, over time, become less and less of the focus of their development efforts because it is no longer seen as the "common case." We frequently find that the flexibility in shaders allows us to find new ways to work around defects in hardware implementations that would have previously been avoidable only by turning off important features.
With more control, we find that there is less state to manage. Additionally, we were previously finding that much state was interconnected and in some cases incompatible. The complexity of managing those results was at times quite difficult. So far, our fixed function driver has 45 workarounds for specific bugs that are paired with the text strings returned from opengl for GL_RENDERER. In many cases, those disable code paths that we would prefer to have available were it not that they had stability problems.
By contrast, we are able to use a common code base to target both dx9 and opengl using shaders. Approximately two thirds of the code, both on the shader side and on the C++ side, are in the platform-independent section. This is especially impressive since the shaders are nominally in a different programming language (albeit fairly similar).
Shaders allow for techniques that simply cannot be accomplished with reasonable performance in a real-time graphics system. One of the more spectacular examples of these is shadow maps. Although basic shadow mapping could be achieved in fixed function, aliasing artifacts make the technique impractical in such systems. Multisampled shadow maps greatly reduce the visual problems that are inherent with the algorithm.
Another area where shaders help is in the handling of transparency. Transparency is an inherently tricky problem in graphics because blending needs to be done with a consistent ordering, usually back to front. In the past, that was done by deferring and then sorting primitives. Depth peeling avoids that sorting by peeling off one layer at a time in the pixel shader. It requires multiple passes, but all the data can be kept resident on the GPU, so it is usually a big net performance win. This technique belongs in both the performance and fidelity categories because it delivers improvements to both.
Post-processing image-based effects can also add significantly to the perceived quality of the images. Bloom fakes some of the perception of high dynamic range by blurring highlights such that they affect a larger area than normal. Also particularly compelling are the effects that take the depth buffer as part of their input. Pixel shader-based silhouette edge detection produces cleaner results than standard techniques, because the edges do not stitch against the neighboring faces. Screen-space ambient occlusion gets some of the effects of global illumination at a tiny fraction of the cost (2 milliseconds as opposed to 2 minutes). Since these effects operate only on the output images, they could to some degree be retrofitted into a graphics system that was in all other ways a fixed function implementation. However, shaders can transform their outputs into slightly different forms that greatly improve the effectiveness of these techniques.
Future generations will continue to bring ever more powerful concepts to the table, none of which will be accessible without the use of shaders. In particular, we at Tech Soft 3D are looking forward to taking advantage of the tessellation shader pipeline stage that was introduced with Direct3D 11, and implementing a technique similar to depth peeling except that it does not require multiple passes.
In case the above-mentioned reasons are not enough to convince you to switch to shaders, it is worth mentioning that, beginning with version 10, Direct3D completely eliminated fixed function. At least for the Direct3D side of things, the fixed function pipeline has ceased to exist.