GDC 2024 - Global Illumination With AMD FidelityFX™ Brixelizer, Plus AMD FidelityFX SDK Updates

Hello, and welcome to this talk on Global Illumination with AMD FidelityFX Brixelizer. My name is Dihara Wijetunga, and I'm a Senior Software Engineer at AMD's Game Engineering Core Technology Group. We'll start things off with a brief recap of Brixelizer and talk about how it works. Then we will introduce Brixelizer GI. Next we will go over the algorithm step by step, then take a look at some ground truth comparisons and performance numbers. Then we'll see how the API looks and how it integrate

s into your application. And then we'll conclude with updates to the FidelityFX SDK. So let's get started. Brixelizer is a FidelityFX library that generates sparse distance fields for triangle geometry in real time for efficiently tracing rays against your scene. It works with both static and dynamic geometry and provides a shader API to trace rays against the distance field. This will be available in the upcoming release of the FidelityFX SDK. Now let's briefly go over how Brixelizer works to g

et everyone up to speed. Brixelizer generates cascades of distance fields around a given position. In most cases, this is the camera position. Each cascade is a voxel grid with each axis being 64 voxels in length. If a voxel intersects any geometry, it generates a local distance field within the voxel. And these local distance fields are known as bricks. A brick is a 3D texture of size 8x8x8 in R8 unorm format. And all cascades allocate their bricks from a global Brick Atlas 3D texture. A texel

within a brick is known as a Brixel. Once bricks have been created, we build a 3-level AABB tree for each cascade. This AABB tree is used for ray scene traversal. We start at the tree level and find the leaf node that the ray intersects. At that point, we switch to ray-marching the distance field until we find an intersection point. So that was a high-level view of Brixelizer. Now let's look at Brixelizer GI, which is a fast and approximate global illumination solution built on top of Brixelizer

. It's entirely based in compute without any need for hardware-accelerated ray tracing. The main motivation for this is to provide a fallback for ray-traced GI so that lower-end hardware can also have a form of dynamic global illumination. This will be available as a part of the next FidelityFX SDK release and will be provided as a library that complements Brixelizer. And all the C++ and shader source will be available under the MIT license as always. The high-level idea of Brixelizer GI is that

it takes in the output resources of Brixelizer as well as the G-buffer from the application and provide you with diffuse and specular GI outputs that you can composite into your final shading output. So here are screenshots of just direct lighting output and then with the addition of indirect lighting through Brix GI. And here are a few more screenshots. And here's a mandatory screenshot of Sponsor. So Brixelizer GI is a much more simplified implementation of AMD's GI 1.0 algorithm, which is av

ailable alongside the Capsaicin framework. Similarly, we also use screen space probes and these are backed by a world space radiance and irradiance cache. Let's take a brief walkthrough of the algorithm. Here you can see a cross-section of the G-buffer surface and the voxel grid that surrounds that geometry. The voxels that contain actual geometry are highlighted in green. So we start off by populating our radiance cache. For each primary ray hit point, we find the corresponding brick and reproj

ect the direct lighting from the previous frame and inject it into the radiance cache. First we spawn screen probes on the G-buffer surface. Then we trace diffuse rays from the screen probes and sample the radiance cache for shading. Then we feed the world space irradiance cache using the screen probes that we just updated. Afterwards, we trace specular rays from the G-buffer and again sample the radiance cache for shading. Finally, we resolve the diffuse GI using the screen probes and the irrad

iance cache. Here is the same algorithm but in more detail. This is a lot to take in but we'll go through it step by step. Before we do that, let's take a look at the various data structures involved here. Why are we using screen probes? So a common issue with traditional GI probes is light leaking. Take these two rooms separated by a wall. The left side is lit using a red light source and the right side is fully unlit. If we were to shade the position shown here in the right side, using the pro

bes surrounding we can notice that some of the probes are in the lit side of the room. So when we interpolate the probes, some of the red light from the left side will leak out into the right side. You can mitigate this to an extent with smarter probe placement but it's not fully solvable. However, the DDGI algorithm does solve this to an extent using essentially a depth map at each probe and evaluating it during sampling. Screen probes however sidestep this issue entirely by placing probes only

on the visible surfaces in the depth buffer. Take this scenario for example. If we were to shade this point using these screen probes, we can clearly ignore the top probe as it is too far away. We can use simple depth and normal comparisons to weigh probe contributions. It also has the benefit of being more dense than a world space probe grid, especially when you are close to the surface being viewed. The screen probes are represented using an 8x8 octahedral encoding, as seen here. But if we ha

ve screen probes, why have a world space irradiance cache? While the screen probes are responsive, they can also be noisy if there is a disocclusion as the probes have to restart accumulation from scratch, which leads to boiling artifacts. On the other hand, the world space cache is more coarse but is also more stable and persistent, which makes it an ideal fallback whenever the screen probes fail. These are stored as second-order spherical harmonics, so 9 coefficients per color channel, and the

y're stored in one large buffer that's common to every Brixelizer cascade. And finally, why are we using a radiance cache? Well, there's a few reasons, but mainly because we do not have access to the material data and UVs in the distance field, so we can't really shade the hit points when we trace using Brixelizer. It also greatly simplifies the integration process. Modern game engines have lots of different types of light sources, materials, and shading models. Exposing all of these is difficul

t. With the radiance cache, the shading is left to the application to handle, and then in Brixelizer, we cache those results and have them looked up whenever we need to. This is stored as a 3D texture atlas in R11G11B10 format with dimensions of 256 per axis. And a brick inside of the radiance cache is 4x4x4 pixels, and it represents the radiance throughout the volume of the brick, so it's half the size of a Brixelizer brick. Okay, now let's dive into the algorithm, starting with populating the

radiance cache. In this pass, we're populating the radiance cache with radiance at each primary ray. This is done using the previous frame's direct lighting output. Instead of doing this for every pixel, we do this at quarter resolution, so we inject one value at each 4x4 tile. We pick a random point within the tile, then reproject the direct lighting from the last frame using the provided motion vectors. We then reconstruct the world space position and find the brick it corresponds to in each c

ascade. Then we compute the UVW coordinate within the brick. Finally, we accumulate the reprojected radiance into the radiance cache at the UVW coordinate. And here's a visualization of what it looks like when we sample the radiance cache directly. Next, we spawn some new screen probes. Screen probes are internally stored as 8x8 octahedral projection mapping. And a probe is spawned for each 8x8 tile of the frame. And a maximum of 8 attempts is made to spawn a probe. Each attempt is jittered usin

g a Hammersley low discrepancy sequence. Using this jittered coordinate, we sample the depth buffer and accept it if it doesn't belong to a sky or background pixel. Then we initialize it as an empty probe and we spawn it at the reconstructed world space position and store the probe information in a buffer. This includes information such as the depth value, the world space normal, and the random seed. Using this information, we can reconstruct the probe position as well as the sample direction la

ter on in the pipeline. After we create new screen probes, we are going to now attempt to reuse some of our hard work from last frame and reproject the older probes. Here we're reprojecting the probes from the last frame so that we can reuse their radiance in the current frame's probes. We dispatch 8x8 threads per probe and each thread handles a single pixel and attempts to find a reprojection candidate. We store the reprojection weight in shared memory and pick the probe with the highest weight

as the probe that gets reused. Here you can see the reprojected screen probes. You can see holes which correspond to probes that failed reprojection. And here is a closer look at it. We also share irradiance from the probes surrounding the reprojected probe and we use normal and position similarity to weigh their contributions. Moving along the pipeline, it's now time to trace some rays from our screen probes and add some new radiance data. If a newly spawned probe fails to reuse a probe from t

he last frame, we inject new radiance into it. For each probe pixel, we uniformly sample the hemisphere and trace a ray using Brixelizer. If the ray hits, we use the brick ID and UVW coordinate to sample the radiance cache. And if the ray misses, we just sample the environment map. Now in this next pass, we are dealing with the specular GI side of things. We start off by tracing rays at quarter resolution and store the brick IDs at the hit points. Since the granularity of the radiance cache is a

t the brick level, the low resolution trace doesn't really affect quality and greatly reduces the number of rays traced. Next in a separate full resolution dispatch, we load in the brick IDs and find the intersection point within the brick. This is cheaper than doing a full trace as we can directly go into the raymage. Finally, we sample the radiance cache using the brick ID and the UVW coordinate. Next we are going to be preparing our diffuse and specular GI outputs from the last frame to be ac

cumulated by reprojecting them. Here we are simply reprojecting the previous diffuse and specular GI outputs into the current frame using the provided motion vectors. Internally, we build a disocclusion mask using the depth buffer and G-buffer normal from the current and previous frames. It stores a value greater than zero if there is a disocclusion, which helps us to reject history. And here's what the reprojected output looks like. We've exaggerated the movement between frames here, but you ca

n see the rejected portions of the history in black. Now that the screen probes are filled, we are going to prepare them to be fed into the irradiance cache. Here we are projecting the 64 incoming radiance values in our screen probes into spherical harmonics and we will use these to feed our irradiance cache as well as to reconstruct the final diffuse GI output later on. We dispatch an 8x8 thread group per probe and we project the irradiance onto second order spherical harmonics and store it int

o an intermediate buffer. Each thread will load the radiance value of the corresponding texel of the screen probe and reconstruct the ray direction. Then we will project the radiance value onto SH coefficients and store them into shared memory. Finally, we will perform a parallel reduction in shared memory to combine all 64 radiance values. This is done over three iterations with each active thread combining three neighboring radiance values. The first iteration uses 4x4 threads with a stride of

2 between each thread. The second iteration uses 2x2 threads with a stride of 4. And the final iteration uses a single thread that gathers the remaining three values. Now the screen probes are in a format that can be easily accumulated into the irradiance cache. We use the previously created SH probes to feed the world space irradiance cache. Each thread loads in a single SH probe. Then for each Brixelizer cascade, find which brick the probe intersects. Then blend the screen probe with the SH p

robe in the world space irradiance cache. Now that the irradiance cache is updated with new radiance data, we are going to try to propagate some of this data. So in this pass, we propagate the irradiance in the spherical harmonics probes at each brick into the neighboring bricks. This is a time-sliced update, meaning that we update one cascade per frame and the most detailed cascades are updated more frequently. Now we're bringing everything together in the interpolate screen probes pass. Here w

e are reconstructing screen space irradiance by projecting the nearest 2x2 SH screen probes onto the G-buffer normal. We also reconstruct world space irradiance by interpolating the surrounding 8 world probes. Finally, we blend them together to get our final irradiance value. In the same pass, we temporarily accumulate the previously reprojected diffuse and specular GI outputs. Now we finish off with the spatial denoise pass. This is a relatively simple bilateral blur to further denoise the GI o

utputs. The radius is determined by the number of samples, so when there's less samples, we use a larger radius. One last detail of the algorithm is how the radiance and irradiance caches are cleared. The cascades in Brixelizer are implemented as clip maps, and bricks get invalidated as the camera moves around. To compensate for this, we clear the cache entries associated with invalidated bricks, and we rely on buffers internal to Brixelizer. So at the beginning of each Brixelizer GI update, we

use an indirect dispatch to clear the relevant radiance and irradiance cache entries. And that concludes our look into the algorithm. Now let's see how Brixelizer GI stacks up against ground truth. On the left side, you can see Brixelizer GI, and on the right is a quick DXR path tracer. As you can see, it captures the indirect diffuse bounce from the ground quite well. But here, you can see some of the smaller scale details are not retained as well as the ground truth. And here's a few more comp

arisons. As you can see, it does approximate ground truth well. Now that we know how it works, let's look at how it performs. The passes that work on screen probes are resolution dependent, and the passes that update the radiance and irradiance caches are scene dependent, mainly based on the voxel size. Since the diffuse GI is inherently low frequency, internally we output at half resolution or 0.5x scale by default, but has optional settings to output at 0.75x scale or at native resolution. On

the Radeon RX 7900 XTX, it takes roughly 2 milliseconds at 4K resolution. On the GeForce RTX 4080, it takes around 2.2 milliseconds also at 4K resolution. The more interesting result is on the RX 7600 XT, which is a 1080p gaming card, and it takes around 1.7 milliseconds at 1080p resolution. This makes it a viable dynamic GI solution for lower end hardware. As for the memory footprint, Brixelizer needs around 16 megabytes of VRAM for internal resources, about 128 megabytes for the SDF Texture At

las, and roughly 1 megabyte for each cascade you need. You also need a scratch buffer for the updates, but that is dependent on the complexity of your scene, and the size can be queried using the provided function for it. Brixelizer GI on the other hand is resolution dependent, and this table will give you an idea of how much VRAM is needed at each given scaling setting. At the default setting, which is 0.5x scale, it takes around 110 megabytes at 1080p, 130 megabytes at 1440p, and 200 megabytes

at 4K. So Brixelizer GI does have a few limitations. The radiance cache being filled using screen space information means that a surface needs to be viewed at least once before bounce lighting can take on its color. After this, cached radiance will persist. However, changes to lighting will also only take effect when they are in view so that the radiance cache can be updated. We can mitigate these issues to an extent by using extra views to inject more radiance data into the radiance cache. For

example, if your application has dynamically updated reflection probes, those can be used to populate the radiance cache with more up-to-date radiance surrounding the player. Or if you can afford some more draw calls, you can draw a portion of the scene that is out of the view at quarter resolution with simpler LODs and shading. Another issue is that you lose some of the small-scale details due to the sparsity of the screen probes. However, you can recover most of these details using screen spa

ce ambient occlusion or screen space GI. Now let's look at the Brixelizer GI API and how it integrates into an application. Brixelizer GI requires an existing Brixelizer integration, meaning you need to create the FidelityFX backend for your API of choice, as well as the Brixelizer context. Link your application against the Brixelizer GI library and include the ffx_brixelizer_gi.h header. Then create the Brixelizer GI context by passing the flags you need, such as inverted depth in this case. Th

en specify the required quality setting, output resolution, and backend interface. To dispatch the Brixelizer GI compute, workloads start off by filling the dispatch description structure with the Brixelizer context, the Brixelizer output resources, which are the the SDF texture atlas and the buffers for brick AABBs, brick maps, and AABB trees. Then assign the required G-buffer resources, which are the depth and world space normal buffers from the current and previous frames, the motion vectors,

and the G-buffer render target that contains roughness. You can specify which color channel to sample from as a flag during context creation. You also need to provide the shading output from the previous frame, as well as an environment map to sample at ray misses. Then finally, provide two output resources that will store the diffuse and specular GI outputs. Then you can pass the camera matrices and the constants required by Brixelizer, such as the start and end cascade index. Then finally, yo

u can call ffxBrixelizerGIDispatch using the context, dispatch description, and your command list. After the update, you composite the diffuse and specular GI outputs into your direct lighting output the same way you would do with image-based lighting, usually by plugging it into Epic's split-sum approximation. Now, using just the direct lighting from the previous frame gives you one bounce diffuse GI. However, if you want multi-bounce GI, you can create a feedback loop by using the composited o

utput from the previous frame as input to Brixelizer GI. Here you can see what the radiance cache and the diffuse GI output looks like with a single bounce. And here is what they look like with multi-bounce. The difference is subtle in the diffuse GI output, but it can help in certain scenes that are indirectly lit. There is also an optional debug visualization mode that you can use to inspect the world space radiance and irradiance caches, as you can see here. Coming into the update, you assign

the Brixelizer context, depth buffer, G-buffer normals, camera matrices, and an output resource. You also choose what debug mode to use, either radiance cache or irradiance cache. And this dispatch will directly output the chosen information into the given output resource. Now, let's look at a live demo of Brixelizer GI in action. So here is the Brixelizer GI sample running the Sponza scene. And currently GI is turned on. And here is what it looks like when GI is turned off. As you can notice,

most of the scene is in shadow. Turning GI back on again, you see that the shadowed areas become filled with indirect lighting. And here is Diffuse GI by itself. And here is Specular GI. And here is the radiance cache. Turning off multi-bounce GI will show you that the areas that were previously lit using secondary bounces now become completely in shadow. As the camera turns around, the radiance cache gets updated with this new radiance data. Turning multi-bounce back on creates a feedback loop

where we feed the previous frame's composited shading output, meaning the final shading output composited with the Diffuse GI and Specular GI outputs into Brixelizer in the next frame, which adds secondary bounces to the radiance cache. When we change the direction of the light, you'll notice that GI also changes with it. Here we can see a better look at the GI. Using the Diffuse GI, you'll notice the bounce lighting from the curtains on these pillars. And that concludes the Brixelizer GI portio

n of the talk. Now let's go over the upcoming version 1.1 release of the FidelityFX SDK. Two of the biggest additions in this release are the Brixelizer and Brixelizer GI libraries. They also come with a DirectX 12 and Vulkan sample that demonstrates their usage. It also shows how to handle static and dynamic scene elements and also how to use the various debug views and debug counters to help integrate these into applications. Another new library is FidelityFX Breadcrumbs. With modern explicit

graphics APIs, it's easy to introduce GPU crashes due to incorrect usage of the API. In these cases, the operating system resorts to restarting the GPU via timeout detection, or TDR, which results in all the GPU processes being terminated. Debugging these kinds of issues becomes quite difficult, especially in a complex renderer with hundreds of draw calls and dispatches. This new library aims to help developers debug these GPU crashes by using a technique called that are essentially writes to a

special buffer that happen before or after the GPU workload in question is executed. When the device is lost, we can dump these breadcrumbs and see which commands were completed and which commands were in flight at the time of the crash. This will help reduce the time it takes to diagnose these issues. This release also includes a DirectX 12 and Vulkan sample that demonstrates the use of breadcrumbs. So you may be asking, how is this different from Radeon GPU Detective and DRED? Radeon GPU Detec

tive is a part of the Radeon Developer Tools suite, and it works on Radeon GPUs and requires no changes to game code to use it since it directly connects to the driver. Breadcrumbs and DRED, on the other hand, are vendor agnostic and are implemented using the write buffer intermediate functionality in DX 12. However, with DRED, you cannot control the granularity. So it inserts writes for every API call. And you also cannot specify any names for these writes. But with the breadcrumbs library, you

can choose the level of granularity that suits your needs, such as per pass or per draw by using the ffxBreadcrumbsBeginMarker and ffxBreadcrumbsEndMarker APIs. This can easily be integrated by plugging it into your existing profiler markers and enabling them when needed. In addition to these, there are also improvements to the FSR and Hybrid Reflection samples as well as various other fixes and improvements across all the samples in the Cauldron framework. There is also a new GDK backend in th

e SDK, which supports both desktop and Xbox Series consoles. Xbox developers will be pleased to know a version which supports Xbox Series will be available to them. Finally, FidelityFX SDK 1.1 is scheduled to be released after GDC sometime in the near future. So in conclusion, Brixelizer GI is a library meant as a fallback for ray tracing global illumination on lower end platforms. And it takes in the G-buffer and direct lighting as input and outputs diffuse and specular GI. It requires no hardw

are accelerated ray tracing support and works on both DirectX 12 and Vulkan. It's also fully open source via the MIT license and it will be available with the FidelityFX SDK version 1.1. Finally, it can serve as the basis for a more advanced GI solution if needed. A special thanks to these people from AMD for their contributions towards Brixelizer GI and the FidelityFX SDK. And that concludes the talk. Please be sure to visit the GPUOpen website for more information. Thank you.

GDC 2024 - Global Illumination With AMD FidelityFX™ Brixelizer, Plus AMD FidelityFX SDK Updates

Related articles

Comments