Edd Biddulph

Twitter | CV


June 2009
Deferred Shading

This is an experiment in realtime, dynamic illumination with normal mapping and omnidirectional shadows. It also includes localised volumetric fog effects, and an NVIDIA-published technique to reduce aliasing of specular highlights. I wrote the shaders for this in NVIDIA Cg, but not all of the features of Cg were fully utilised, and I did not use CgFX. In the future I would probably try and make use of CgFX as fully as I can since it simplifies the programming process and allows a great amount flexibility / portability. Since creating this demo, I have discovered that there is a great deal of similarity between Cg and HLSL so I could try porting this to DirectX in the future.

As well as being a small demonstration of surface illumination, this program could be extended with more features and probably also built into a full game. It is essentially a rendering subsystem with some very simple interactivity and animation added on top. The rendering subsystem takes a scene description consisting of textures, actors, and a visibility look-up table built over a BSP (binary space partitioning) structure for the level geometry. BSP node and visibility data is produced by a separate program, but the rendering subsystem could be modified to use an occlusion culling method more relevant to today's tools and consumer hardware. The scene format carries an identifying token and a version number to prevent the reading of corrupt or incorrect files. Frustum culling is also used to prevent the processing of out-of-view scene elements - this applies also to the shadowmap rendering.

The deferred shading proceeds as follows: extra data, in addition to colours, are stored in a superbuffer (also known as a geometry buffer or g-buffer in some texts), which has a floating-point pixel format. The data can be anything, but in this case includes camera-relative coordinates and surface normals. This buffer is populated in a first pass over the scene which applies basic shaders to emit the desired data. Then the buffer is used in lighting calculations in additional passes. The benefit of this is that the illumination algorithms in a rendering are decoupled from the geometry-building algorithms and power is not wasted lighting hidden surfaces. In the past, this also meant that shaders could be focused on particular effects but this is less important today with such mechanisms as OpenGL program pipeline objects. This idea of deferred shading is closely related to the concept of image-based rendering and relighting.

The shadowmaps consist of cubemaps storing two values per texel in order to perform variance shadowmapping. This technique allows the values of the map to be filtered before depth comparison. Although the best way to take advantage of variance shadowmaps is to run an optimised filter over them before rendering, this demo filters whilst shading.

To make sure the filtering kernel was aligned with the cubemap faces correctly, I built a small cubemap which contained tangent coordinates from which a local frame could be derived. This saves the shader from performing conditional operations when sampling shadowmaps. It uses a random rotation of sampling points using a screen-aligned noise texture, based on a technique published in a Crytek paper on the graphical effects in their game engine (at the time this was Cryengine 2). C++ function pointers are used to switch between rendering modes, allowing the same code to be used for both shadowmap and camera rendering but also allowing optimisations of each. Mailboxing was used in conjunction with BSP leaf face references - this ensures that the same surface is not rendered twice in a single frame even if it is referenced twice. The frustum culling uses an efficient box-versus-frustum intersection test which only requires one dot-product. Light sphere geometry is generated from a tessellated cube, the vertices of which are then normalised.

This was first created in 2009, but a couple of years later I went back and fixed up a few things. Firstly I changed the superbuffer to use 16-bit floats instead of 32-bit floats. This went for the shadowmaps too, and it resulted in a noticeable speedup as less bandwidth would be consumed when transferring the smaller format in memory. A reduction in memory usage is always good anyway. It also probably meant better texture cache utilization. Secondly, I switched the renderer from using plain vertex arrays to using vertex buffer objects for both the coordinates and indices.

In addition to this, I replaced the textures and meshes I was using with my own. The mesh was created in Blender, and the textures are synthesised when the program runs. My own procedural texture generation class is now included, in the files GenTex.hpp and GenTex.cpp. They allow the creation of instruction lists at runtime, which can then be executed to produce an image of any resolution. GenTex is similar to a shader system, but intended to work with C++ functions to create textures and not render images as such. It's a work in progress, but I needed a quick way to create normalmaps for this demo.

Download (includes Win32 executable and source. Source code is licensed under the zlib license.) - YouTube.

Some information on binary space partitioning.

This is a pretty good (if old) article on the tricks used in S.T.A.L.K.E.R.

I used this to reduce distracting sparkles on specular highlights due to mipmapping.