Main

GDC 2024 - High Performance Rendering in Snowdrop using AMD FidelityFX™ Super Resolution 3 (FSR 3)

AMD FidelityFX™ Super Resolution 3 (FSR 3) is AMD's open-source upscaling and frame interpolation solution. Massive Entertainment integrated AMD FSR 3 technology in the Snowdrop engine, which supports a variety of titles and platforms. This talk covers how the integration has been done to ensure FSR 3 is well integrated into their pipeline, which issues they were facing and how the integration helped to improve FSR 3 itself during its development process. Download the slides: https://gpuopen.com/gdc-presentations/2024/GDC2024_High_Performance_Rendering_in_Snowdrop_Using_AMD_FidelityFX_Super_Resolution_3.pdf *** Visit our website: https://gpuopen.com Follow us on X: https://x.com/GPUOpen *** Subscribe: https://bit.ly/Subscribe_to_AMD Join the AMD Red Team Community: https://www.amd.com/en/gaming/community Join the AMD Red Team Discord Server: https://discord.gg/amd-red-team Like us on Facebook: https://bit.ly/AMD_on_Facebook Follow us on Twitter: https://bit.ly/AMD_On_Twitter Follow us on Twitch: https://Twitch.tv/AMD Follow us on LinkedIn: https://bit.ly/AMD_on_Linkedin Follow us on Instagram: https://bit.ly/AMD_on_Instagram ©2024 Advanced Micro Devices, Inc. AMD, the AMD Arrow Logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

AMD

4 days ago

Hi, my name is Colin Riley and I manage the Core Technology Group at AMD, which is responsible for developing and releasing the FidelityFX technologies that you see in games. Today I'm going to present about FSR 3 before Hampus discusses the Snowdrop integration. So how did we get here? We released FSR 1, our spatial upscaling solution several years ago. It was very easy to integrate with a single compute shader for upscaling, and it can be placed late in the graphics pipeline for performance sc
alability. The downsides are that it required a good anti-aliasing implementation to really shine. With FSR 2, we added the temporal element, and so it replaces Temporal Anti-Aliasing in the game render pipeline. It is a purely analytical solution and open source with a permissive license. Today with FSR 3, we added frame generation, which means it touched upon many new areas of the graphics pipeline with new challenges for integration. Again, with FidelityFX, it's open source with a permissive
license. The development of FSR 3 started in 2022 with a proof of concept. A small team worked on it through to release in 2023, including integrations into our preview titles in late September. There were no changes to upscaling, except for some fixes for the new native AA mode. The goal was a cross-platform frame generation for all to ensure the widest possible uptake into many gaming platforms. When considering generation of new frames, there are two paths you can go down. Interpolation and e
xtrapolation. Interpolation means you take data from two confident data points, frame N and N + 1, in order to generate a new frame, G. This does add some observation latency, however you cannot show the newly generated frame until after you have rendered the real frame after it, N + 1. Interpolation generates data beyond your data point, and as such has lower latency because you do not need frame N + 1 to generate frame G. However, you only have one set of data points, so have lower confidence
and as such it can be a more computationally expensive operation. Regardless of what path to choose, both have complex challenges around how you present the generated and real frames to the user. FSR 3 uses frame interpolation. Whilst it does have a latency cost, the higher confidence in the data means there is a potential for less artefacts given a unit measure of GPU cost and less reliance on in-painting, a process I'll describe in more detail later. The challenges are in interpolating non-lin
ear motion, high frequency artefacts, and game motion vectors not matching what's on screen. Perspective projection and camera movements make even static objects move in a non-linear way. We can somewhat improve the results of approximating non-linear motion by ensuring our input frame rate is high enough, hence our guidance that 60 FPS minimum render frame rate for best gaming experience. You can see here in the diagram that as the samples FPS increase, the interpolated motion path approximates
to the actual motion path more. The latency aspect is usually commented on often, but from what we have seen the latency cost should remain less than if the game were rendering at its native display resolution. We released this diagram last year highlighting where the FSR 3 frame generation latency values should exist. When running FSR 3 upscale and frame generation, the perceived input to photon latency should be within the range bound by if the game was running with only FSR 3 upscale on the
low side and then the top bounds would be running without upscaling or frame generation at the native display resolution. The motion vectors used for frame interpolation are the same low resolution ones used for upscaling. They are used for disocclusions and reprojection operations within the frame generation algorithm. The vectors will not map directly to final colours due to UI and post-processing effects such as the motion blur not being considered. They occur post upscale. Other items that m
ay not have motion vectors are transparent objects and shadows of moving objects. Frame generation needs more information to assist with creating accurate interpolated frames and this is where Optical Flow can help. Optical Flow is a technique which tries to identify objects and estimate their motion from one frame to the next. For FSR 3, we took the motion estimation algorithm from the OpenCL Video Fluid Motion Frames codebase and experimented extensively. We expanded upon its capabilities to a
dd smaller block sizes and adjustable search radius. There were various trade-offs to be made between resolution, false positive movements, and minimum object size, but we settled upon a single mode which when combined with aspects of game motion vectors can give us excellent results across a wide input range of content and all at a very low GPU compute cost. The Optical Flow vectors and game motion vectors are combined into a motion vector field using various heuristics. At its most basic, you
can think of it as biasing more towards Optical Flow vectors when we have low confidence in the game motion vectors. We can test for confidence by simply reprojecting our colour sample from one input frame to the next and testing against the real frames. When using the motion vector field for generating interpolated frames, we can get most of the way there to creating an accurate new frame. Now we have a good amount of data in order to generate our interpolated frame. We use the motion vector fi
elds, disocclusion masks, reprojected colours, and then we in-paint any areas of low confidence by taking image data from a MIP pyramid that we generate. At this stage, areas of low confidence are mostly those where they are occluded in both the previous and next frames so the algorithm makes a call and it's usually very hard to see any issues when in motion. There are issues that remain in areas around fast dynamic shadows and effects like vignettes. In the shadow case, you will likely see a do
uble shadow effect rather than a solid line. We chose to focus more effort in FSR 3 on the user interface. From the start, we realised that artefacts on the scene is one thing, but artefacting UI would be the most distracting to gamers. UI can be easily corrupted, but game motion vectors will not map to the UI and optical flow may be too coarse to help with the UI stability. So we chose to treat UI differently. FSR 3 has many different modes for compositing or redrawing UI onto the generated and
real frames. We have a basic mode which simply takes a render rate UI plane and composites it onto both real and interpolated frames. We also have a path which takes a pre-UI scene as well as a presentation surface and tries to treat UI areas differently during interpolation. These can both work well but have downsides. They are both render rate instead of display rate, so may look choppy if the UI has animations and it also restricts the effects you can use. If the UI has dropped shadows with
the scene behind and it's kind of blurred, these blends may carry over and show up as artefacts. For artefact-free display rate UI, we recommend using our callback which allows developers to re-render on top of generated frame surfaces. This can yield smooth animations with no artefacts but will take longer to integrate. The Unreal Engine 5 plugin has different modes of re-rendering UI and it's best to refer to the plugin documentation for specific options available on that platform. Now that we
have our generated frame, we now need to present it to the screen. For this, we need to introduce waits into the presentation system in order to pace the present calls. If we did not wait especially in the VSync off-case, frame generation can happen so fast that in many cases either a torn fully generated or a torn fully real frame would be seen and gameplay would not be smooth. Adding waits without decoupling the game loop will introduce performance degradation, so that's what we do. Our frame
interpolation swap chain handles this decoupling so the developers can integrate this quickly and identify issues. We have previously mentioned that graphical game overlays can impact upon our frame pacing and our presentation. You can run benchmarks without having the overlay present and this is what we'd recommend when looking into FSR 3. There is actually a few more things that can break presentation, such as not running in full screen and using solutions that hook things like the DX12 runti
me. We've seen these actively changing the sync interval and the tearing flag settings that we provide to present calls, which created stutter and frame tearing. The reason why these overlays can impact pacing is because they occur outside of our area of influence. Being cross-platform, we can only see what happens and account for it before the driver present calls. This is why we have these pacing waits in the FSR 3 asynchronous swap chain. In these pacing waits we can account for workloads lik
e the UI callbacks, but any additional GPU or excessive CPU workloads that occur post-present calls may not be accounted for. So our timing of these waits can be thrown off. This can introduce stutter or even screen tearing, especially when running in variable refresh rate mode. The waits themselves are determined by a rolling average of the previous frame times. So this has been the hardest part of developing FSR 3. Originally, the plan was only to support legacy VSync on and off, but it was ob
vious that gamers really value variable refresh rates (VRR). During our implementation of this, we discovered that it's not simply the GPU hardware that could have adverse effects on VRR. It's hardware scheduling options in the operating system and even two monitors supporting VRR with the same sort of specifications can have widely different behaviours. There are plenty of monitors as well, with refresh rate monitoring overlays, and we have found these to be sometimes inconsistent in how they w
ork. We've created many implementations of frame pacing in order to test how they looked to gamers. We tested various scenarios, including different input FPS such as 30-60, 45-90, 50-100, 60-120 and above. We used metrics, including our frame tearing indicators, frame pacing graphs, but mostly we relied on visual perception by real people, as this is fundamentally a personal question and many people can see things differently. Our testing included implementations producing frame-time graphs, li
ke what you can see here. Some looked like a normal, non-generated frame-time graph. Some had wild frame-time spikes, but the ones which stood out in the way, and actually the ones that we ended up shipping on GPU open, looked "fuzzy". They had a little bit more frame-to-frame variability, but seemed to handle changes in frame time from the game much better, which makes the difference when running on variable refresh rate monitors. And the goal really is to limit tearing, as well as present smoo
th experiences here. One last thing to note is, where available, we saw that hardware scheduling can help with FSR 3 pacing, so make sure it's enabled in your system for the best experience. When it comes to performance, I think many will be surprised at how fast you can generate a new frame with FSR 3. It's around 1.5 milliseconds for a 4K frame on AMD Radeon RX 7900 XTX, and that includes the Optical Flow passes. FSR 3 requires Shader Model 6.2, and in terms of memory use, FSR 3 frame generati
on can be enabled at 4K with FSR 3 upscaling already running in performance mode, in as little as an additional 130 megabytes. This run-time cost can be hidden by using asynchronous compute, and overlapping it with some low occupancy workloads in your game frame. However, we recommend integrations first start by using the non-asynchronous path to validate generated frames and pacing beforehand. This is because resources need to live longer past the end of a game frame for safe async compute usag
e, and sometimes that does not play well with game systems for resolution changes and window events. We can isolate those events and integrations afterwards and attempt to go async compute for maximum performance later. When it comes to latency, we do see that enabling interpolated frames adds some perceived latency. However, it does usually remain within our previously stated window of being more than simply upscaled, but lower than native rendering at the display resolution desired. This was c
ertainly the case for our first two released integrations. Once FSR 3 is integrated, any issues can be debugged via internal features. There are tear lines that can be drawn at the edge of the screen to identify poor pacing, and then there is a debug view that is presented here which can be used to identify other issues in the algorithm. It can be enabled by configuring FSR 3 at run-time with the draw debug view flag. Interesting items are the game motion vector display, which should be able to
show issues if you see the game items in motion that do not have motion vectors here. The Optical Flow vectors are top right, this should identify larger particle movement as shown. If there is some ghosting, sometimes this could be due to the disocclusion masks being too conservative, and this can indicate that the depth input flags are incorrect. Lastly, if the HUD-less UI mode is used, there is a way to see the input without and with UI. Now please let me introduce Hampus, who will now presen
t how FSR 3 was integrated into the Snowdrop Engine. Hello, my name is Hampus Siversson, and I'm an associate lead engine programmer at Massive Entertainment. Today, I will be talking about our integration of FSR 3 into Snowdrop. A quick disclaimer though, this presentation is not about the FSR technology itself, but rather more about the practical side of things and what we learned during the integration. Let's take a quick look at the agenda for today. We'll start with a small intro on the Sno
wdrop engine and our motivation. We'll take a look at some integration details, then we'll have a look at the external resources used by FSR and how we handle them in Snowdrop. We'll take a look at how everything fits into the Snowdrop rendering pipeline, and lastly, we'll finish up with some extra details on the new addition to the FSR SDK - frame generation. So what is Snowdrop? Snowdrop was initially built for Tom Clancy's The Division, and ever since, it has been a key piece of technology at
Ubisoft. The latest iteration shipped with Avatar Frontiers of Pandora late last year, and this is also where we debuted the FSR 3 implementation. There is also a bunch of exciting upcoming projects using Snowdrop, so we have a bright future. Ok, so why do we look into FSR? Well, after The Division 2, we were looking for a performance solution to either upgrade or replace our existing TAA solution. This TAA already had an upscaling component, but the quality was not where we wanted it to be. In
addition, we needed a solution that could be used on all our major platforms to achieve a unified result. So we landed on FSR, actually FSR 2 at the time, but we liked the result so much that we made it our default solution for both anti-aliasing and upscaling, and this is true even today. And of course we use it on all our major platforms. So let's look at some details then. The FSR SDK ships with two native backends, DX12 and Vulkan. Both of them are production ready and shippable. So you cou
ld essentially just integrate them and ship your game with them. However, for us, we wanted a bit more control than that. Snowdrop, as many other engines, has its own high-level wrapper around the platform-specific APIs. This layer usually contains a lot of extra functionality, tailored for the engine itself. So we thought, why don't we just use all of this functionality for FSR as well? Well, luckily for us, the FSR SDK provides a pretty straightforward way to hook your own renderer into the SD
K. By using the FFX interface, you can override a bunch of functions to essentially take control over the entire SDK. And this opens up for all sorts of functionality, such as debug visualization, profiling, memory management - you name it! These hooks are defined when we're setting up the FFX context. More specifically, these three lines. Internally, FSR uses multiple contexts for each feature. So, there is actually three contexts here. We have the shared one, the upscale context, and the frame
interpolation context. This is mostly a memory thing, so the hooks are the same across all of them. So, don't think too much about that right now. We'll touch upon it later again in the presentation. Alright, so let's take a look at how an example of one of these hooks could look like. More specifically, we will take a look at the execute render jobs, which is the one responsible for dispatching the compute shaders. This is roughly how the implementation side looks in Snowdrop. And I'll walk yo
u through it. The first thing that we do is that we loop through all the scheduled jobs. For each job, we check which type it is, and we take corresponding actions. So for example, if you have a clear job, you run the clear operation, or if you have a copy job, you run a copy operation. But the more interesting one here is actually the compute type, which we'll take a closer look at. Looking at the execute compute job function, there's a bunch of things happening. By hooking our own callbacks, w
e get fine-grained control over all the scheduling. So, we can also take decisions to either change the behaviors or add things like quality of life improvements. The first thing that we do is actually to gather resource information. We have actually moved the entire FSR implementation into bindless. So, this part is where we extract the descriptor indices. And we also run for each resource a specific function that we call MarkResourceForBindless. This one essentially just transitions the resour
ce into the state that you want, and it checks for residency. The next thing that we do is that we bind all the SDK provided constants. And then of course, we bind our own constants that we added for all of the extra functionality such as the bindless descriptors. And we also pack some extra metadata in here, which I will be touching upon later. The last thing that we do is that we set the shader state and we dispatch the shader. Of course, this is just an example. You can implement this functio
nality however you like and what fits your renderer. Let's move back to the callbacks and take a look at resources and pipelines. Most of the internal resources in FSR are created during context creation. The only exception is the external resources, which are registered in runtime. But we'll come back a bit to those later. For now, let's take a look at creating resources. By taking control over the resource creation, you get fine grained control over all the allocations. You can use your own cu
stom resource types and you can initialize all of them based on their usage or however you prefer to initialize them. Some resources are not required to be kept indefinitely in memory, thus we can discard their memory allocations when they're not needed. If the aliasable flag is specified on the incoming description, it means that the resource can be temporary and we can allocate it just in time. This is something that fits very well with our internal temp resource solutions and it allows us to
minimize the memory overhead. The temp resources are allocated just before the FFX dispatch call and this is done for each context. Right after the dispatch call, we discard them. Ok, so we had a quick look at the resource creation, let's take a look at the shaders then. In Snowdrop, we use a custom intermediate language for shaders, which we call MShaders. It is roughly based on the HLSL syntax. However, FSR ships with native HLSL source, so naturally we need to find a way to convert an HLSL sh
ader file into an MShader file. Of course, during this conversion, we have the perfect opportunity to inject some extra functionality. Some of the extra things that we added were the bindless accesses and we have a couple of custom loading functions that we run. FSR shaders also have a couple of shader options that can be specified on context creation. These options drive which permutation of the shader we are going to use. These permutations need to be respected by our conversion process as wel
l, so that we have all of the permutations neatly converted and stored on disk. As I mentioned, we converted all resource accesses to bindless. This was done mainly to avoid running out of local registers, but we also saw a measurable performance boost on the CPU. Here's an example of how we handle it. Take the LoadInput function as an example. And in the case of the FSR 3 upscaler depth clip shader, we can see how the final shader would look like when it's resolved. Note how we resolved the def
ines and inserted a hard-coded index for the descriptor index lookup. So how does the actual conversion process look like then? Well, for each permutation, we first gather resource binding information from the native shader file. We need this to accurately bind the bindless information that I showed earlier. Then we preprocess the native shader to resolve any preprocessed directives. The preprocessed code is then parsed by our own shader parser into an AST-like representation. Once we have the s
hader properties, we can convert the parts that we want to convert and strip out things that we don't need, such as the native resource bindings. When this is finished, we now have a proper MShader file that we can write to disk. Ok, so let's move forward to some external resource requirements then. By external, I mean not internal to FSR. So these are actually resources that needs to be provided to FSR from your own renderer. As I mentioned before, we have different contexts and they are used t
o store different external resources or different internal resources as well. So we have the shared context, which shares data between both the upscaler and the frame interpolator. And this one holds the depth and the motion vectors. We have the upscaling context, which holds two textures that are actually optional. And those are the reactiveness and the transparency and composition. We don't really use the transparency and composition, so I didn't include a picture of it. And this is mostly bec
ause we didn't find a good use for it yet. For frame generation, we need the HUD-less game scene, which is the renderer output without any UI composited. And then of course, the separated UI texture to allow for deferred composition. Okay, so let's dive into some details about the motion vectors specifically. Motion vectors are precision critical. The upscaling algorithm is sensitive to errors. If you have any errors in your output, the frame interpolation on top of it would make everything wors
e. In Snowdrop, we have something called vertex modifiers, which are just artist-driven shaders that drive the vertex positions. Those can be very problematic, especially if you have some form of mismatching last frame data. This would mean that the reprojected position that is used to build a motion vector would be incorrect. And this would lead to, of course, incorrect motion vectors, or even sometimes they might be entirely missing. The native Snowdrop motion vector resource encodes additiona
l metadata. This makes it problematic to pass into external effects. In general, this would mean that we need a decoding prepass to decode the motion vector information and store it in a separate texture. This texture would then be used as an input to FSR. But since we build our own shader variations and we hook our own render API, it is actually much more convenient for us to just sample this global motion vector resource directly and account for the decoding. This avoids both the extra memory
footprint and the decoding cost. And by the looks of the picture, this is just a rough visualization. The encoding for the motion vectors are actually much more complex than this, but for simplicity, this is how it looks. Okay, so how does it actually look in the shaders then? Take the LoadInputMotionVector function as an example. This is the part that we actually changed. And what we do is that we sample the global motion vector resource, we decode the data and we extract just the offset vector
and we disregard all the metadata. Then we take the offset vector and we pass it along the regular FSR code. As mentioned before, it allows us to bypass the need for a prepass so we get some extra performance back from it. Ok, so let's take a look at reactiveness. What is reactiveness? Well, it's a resource that drives the history lerp within the upscaling algorithm. Where 1.0 means to use no history and 0.0 means to use a full lerp of the history. In Snowdrop, reactiveness is actually binary a
nd we include it in our stencil buffer. It is a legacy approach and it's somewhat problematic for certain parts of FSR. And therefore we provide a global reactiveness value that we use in place where we have marked pixels. This works for us because the reactiveness is mostly used for particles and not a lot of other things. But this is actually changing a bit, so for the future we want this texture to move to a full UNORM texture so that we get finer grain control over the history. Ok, so how do
es this look in the shaders then? Well, it's very similar to the motion vectors. We just use custom sampling. So what we do is that we read the global stencil texture and we return the global reactiveness value if we have a match. Ok, so where does all of this fit into the Snowdrop rendering pipeline then? Well, we have two key points that we want to keep track of. We have the pre-post effects point and we have the presentation point. Let's take a full frame of our in-game profiler or our in-edi
tor profiler from a snippet of Avatar: Frontiers of Pandora. This marked point or this marked area is actually showing the work that's done on the GPU. All other work is CPU-side work, but we will be focusing mostly on the GPU-side here. Snowdrop is using a deferred renderer and as per usual you will find the most common passes in our timeline as well. From left to right you have depth, shadows, g-buffer, ray tracing, deferred lighting, forward rendering, and post effects. The FSR upscaling happ
ens just after we've finished our forward rendering pass, but before UI and post effects. For frame generation, frame interpolation work is scheduled during the task that normally would present your frame. However, due to the asynchronous nature of frame generation, presentation is deferred and we run the interpolation work asynchronously with the next frame's graphics work. Once the interpolation task is finished, we can present our interpolated frame, respecting the pacing requirements of cour
se, hence the offset from the interpolation task itself, and then the real frame gets presented once the next pacing interval has finished. As you can see, our layout closely resembles AMD's visualized chart for the frame generation pipeline. Alright, so let's dive into some details about frame generation then. Frame generation is a technique that analytically generates extra frames. It improves the display rate and makes the game feel more fluid. FSR 3 ships with a frame generation swapchain an
d that one works great, but it's built on top of the DX12 interfaces. To get better control, we actually decided to take the implementation one step up into our own swapchain wrapper layer. This also makes sense since we hooked the FSR API into our own render backend. However, it comes with the quirk that you need to do your own pacing. And frame pacing is tricky. Oh, and one more thing. By implementing the pacing and interpolation code as part of our own swapchain wrappers, we've now opened the
door for platform agnostic frame generation. But that's something to talk about in the future. Okay, so the custom swapchain implementation then. Our implementation is heavily-inspired by AMD's swapchain implementation, but we needed to add a few tweaks here and there to make it fit our own pipeline. The most notorious issue that we had was fences. This was mostly due to our fence validation not being prepared for new command queues and indirect dependencies. Due to the fact that we wrap our fe
nces in a very render task oriented way, we don't really have control over the fence values themselves and thus we cannot either predict future fences, which is an idea you would want to use in this kind of implementation. So, we actually hacked around it a bit by using temporary resources for offending fence validations on resources and some of the validation errors we just ignored. The pacing thread needs to control the pacing of the frame presentation. So we have a specific thread that contro
ls this pacing. Therefore it also needs to introduce pacing stalls to ensure that we have an even distribution of all the presentations. The most accurate way of stalling is to spinlock. However, spinlocking in itself can be problematic since it's using 100% of the thread until the spinlock exits. We weren't too happy with this, so we decided to investigate what we can do about it. We tested various ways to sleep the thread to allow other threads to gain focus, but in most cases this was too ina
ccurate and we didn't wake up the pacing thread in time. The best solution we found was a hybrid sleep, which essentially sleeps in intervals. It measures each sleep to see how much it overshot and then when we hit a certain threshold we move over to spinlocking for an accurate wakeup. This roughly gave us an efficiency increase of 80% or so. On top of this, VSync makes everything harder. When you have VSync enabled, you have to respect the vertical blank. If you do not respect the vertical blan
k sync point, you might end up in a performance degraded feedback loop, which means you consistently miss the vertical blank because the previous frame was waiting for a vertical blank that it could have hit earlier in a way. To counter this, we actually use a separate thread to accurately predict the timings for the pacing stalls, so the prediction is not part of the pacing itself. The calculations are actually excluded from that timeline, so in a way we can always accurately predict the vertic
al blank. Your custom swapchain implementation also needs to support a pass-through mode. This is to avoid having to recreate the swapchain whenever the frame generation is temporarily disabled. What it is, is just a presentation mode that does the regular present, using the frame generation swapchain. No interpolation is scheduled when this is enabled, so you don't pay the cost of any of the interpolation work. But when you do need to recreate the swapchain, it can be quite a challenge. We do i
t to minimize the overhead when frame generation is disabled, and you need to do this in a convenient place in the frame, because it requires a CPU-to-GPU sync. It also requires a full reconstruction of the swapchain object, because you have to switch the command queue. If your application is running a third party software that somehow hooks into the application, the third party software might take a reference to your swapchain object and not release it properly, and it might throw you one of th
e errors, such as the one on the screen. If this happens, you have to abort a recreation and revert back to the original swapchain. So, let's talk a bit about compositing UI in frame generation. There are mainly two approaches to how you can composite UI, you have full rate and half rate. Let's take a quick look at how a full rate approach might look like. Full rate compositing means that we run the full UI render for both the interpolated frame and for the real frame. This part of the flowchart
is what I'm talking about. Note how we render the UI twice. For a half rate compositing, it means that we render the UI once and then we use that result to composite both the interpolated frame and the real frame. And again, this is the part of the flowchart that I'm talking about. Note how the renderer is outside and we compose the result twice. In our approach, we were initially looking at the full rate approach. However, we quickly realized that the full rate approach would require substanti
al changes to our pipeline. Due to the deferred composition, it would force us to keep render data alive for longer than we usually do, increasing both the memory footprint and complexity. So for now, we simply went with the half rate approach, since it's better fitted to our existing pipeline. However, it still came with a cost, and this cost is to keep an extra copy of the UI texture around for the deferred composition. We also have a bunch of UI elements that are blurring the background. And
this can be very problematic when rendering in half rate, because the background actually changes between the two compositions. As you can see in the image, it can create very ugly seams around the UI elements. So in the latest patch of Avatar, we simply disabled it, because it wasn't contributing enough to make it important. Ok, so let's conclude. AMD's FSR 3 solution has been a complete game changer for Snowdrop. It helped us open up performance we previously didn't have, while still keeping t
he visual quality on top. And on top of it, it has a really nice SDK to make the integration very seamless. I would like to thank everybody who contributed to the integration, and a special thanks to AMD's DevTech team for holding our hand through the integration. And of course, to everybody else I've bothered during the development, thank you. Thanks, Hampus. There's one small thing left to discuss, and that's FSR 3.1. We've spent some time in the upscaling, and have brought some new improvemen
ts which should assist in the areas of temporal stability. We also have support for Vulkan, and Xbox developers will be pleased to know a version that supports Xbox Series will become available to them. In addition to general bug fixes, we now have the ability to separate FSR upscaling from frame generation, which allows for better interop with other upscaling solutions. Lastly, we are introducing the FidelityFX API, which we'll launch with FSR 3.1. This is an attempt to significantly lower our
ABI surface, and enforce DLL use across the board. This can unlock the possibility for upgradability of FSR 3 DLLs in certain circumstances, and it will allow for game developers to upgrade to new versions of FSR faster and with less code changes. We have integrated some upscaler improvements into FSR 3.1, which includes internal changes to how we deal with high and low frequency input signals. as to make better decisions in the algorithm. This should allow for better preservation of detail whil
st trying to reduce temporal instability and ghosting. In the example shown here, we see FSR 2.2 temporal instability being improved greatly with FSR 3.1, along with a reduction in ghosting. This footage was captured at 1080p in performance mode. We give thanks to our partners for allowing us to use and show these examples of upscaling improvement. A large change with FSR 3.1 is that we can now support third-party upscalers whilst enabling FSR 3.1 frame generation. We originally architected the
system so that frame generation workloads took intermediate data from the upscaling pass as a performance optimization. We now offer a new API entry point for generating that intermediate data separately, allowing game developers to use existing third-party upscaling solutions in combination with FSR 3 frame generation. The additional step of generating some intermediate data costs a small amount of GPU time, but we hope this additional flexibility allows for more integrations of our frame gener
ation solution. FSR 3.1 will be getting released on GPUOpen very soon, so keep up to date by following the GPUOpen social channels. This presentation on FSR 3 would not have been possible without these talented individuals and I thank them for their contributions. Thank you for your time and I look forward to seeing how game developers utilize FSR 3.1 in the future.

Comments