../

Exploring Bevy 0.8’s rendering process

Published: , updated:


Bevy’s rendering process isn’t particularly well-documented, and there aren’t great practical examples that show how it fits together either. As a learning exercise, and hopefully to help others, I decided to learn how it works and write what I discovered.

The first steps

Ideally, the first place to look would be the documentation. Unfortunately, the Bevy Book doesn’t say anything about the rendering process - or anything beyond the minimum necessary to get a Bevy example built. The Bevy Cheat Book has a lot more documented about Bevy in general, but not much in the rendering section, but introduces Rendering Stages, which at least gets us started.

Render Stages

We can see that RenderStage is defined in the bevy_render crate.

These rendering stages are the highest level steps taken during rendering. Extract copies data that will be used during rendering (which will allow for the original data to be modified), Prepare will do some processing of that data, Queue will generate the GPU commands to be executed (draw calls), PhaseSort will sort something called RenderPhases, Render will actually execute those GPU commands, and Cleanup will perform post-render cleanup.

We can also see that the RenderStages are configured as part of the RenderPlugin plugin. Plugins are where the systems that run during Bevy execution are defined, so we’re definitely in the right place.

Unlike other stages, these aren’t on the main Bevy “app”, but rather a separate “render_app”, which is then added as a sub-application of the main one.

These Render Stages are hooks which the other Bevy plugins (and your own code) can use to run their own code. There is a simple example of how the Extract Render Stage can have a system added to it to copy time from the main app to the rendering app every frame.

The most likely interesting stages will be Render or possibly Queue, so let’s see which plugins add systems to these stages.

It looks like RenderPlugin itself is the only plugin that hooks into Render. It only calls two systems: PipelineCache::process_pipeline_queue_system and render_system.

PipelineCache doesn’t have much documentation, but the code seems to be involved in ensuring that compute and render pipeline descriptors and the shaders they reference are configured on the GPU - not actually executing any rendering.

That leaves render_system, which is where we hit the jackpot - RenderGraphRunner does the heavy lifting based on the RenderGraph.

RenderGraph

RenderGraph has a good description of how it works in its source documentation. The executable logic (the actual draw calls) is in Nodes, and the dependencies between Nodes are stored as Edges. In addition, Nodes have input and output slots, allowing them to pass data between each other. Finally, there may be subgraphs.

RenderGraphRunner is responsible for taking all the configuration in the RenderGraph and executing it, which it does in run_graph, which is called recursively for subgraphs.

The game of life example shows how to add a new node to the render graph.

The Node trait is implemented in a few places that help to understand how Bevy’s default rendering is implemented:

This seems to make sense - you can imagine that you’d want something like this for a standard 3D game:

For a 2D game, you’d probably just have:

So what is CameraDriverNode? MainPass2dNode and MainPass3dNode seem to be doing all the work from the perspective of their cameras - why is there another camera node?

Those nodes aren’t actually being added to the primary render graph - the core_2d and core_3d libraries actually create their own render graphs (per camera), and add them (without being executed) to the primary render graph. I can be hard to see, but line 64 is where graph (the primary render graph) is fetched, and it is only used once - to add the sub-graph, on line 80.

These subgraphs are executed in CameraDriverNode.

Getting practical (in theory) with RenderGraph

With this information, we can now speculate how to achieve some things.

For example, we could imagine a Bevy debugging UI would have their own node, and that will be added to the render graph with edges to the camera driver node so that they execute after it. And if you look at bevy_egui’s setup code, that’s exactly how it works. This ensures the debugging UI will always appear above the standard rendered content, and in more complicated setups, doesn’t end up being rendered to some alternate target besides the primary window.

We’ve already seen an example (game_of_life) of adding a Node to the primary render graph to perform some computation that can subsequently be rendered as a sprite for the core 2D rendering pipeline.

You can imagine some post-processing effects (say, greying out or blurring out the screen during pause menu) would be implemented as a Node that executes after the core rendering pipeline and uses their results.

You can even insert steps into the core 2D/3D rendering pipelines. Using RenderGraph’s get_sub_graph_mut method, you can grab and insert new nodes into those subgraphs. That’s how the shadow maps are inserted for the PBR materials - the shadow maps need to be generated for use by the PBR shaders.

Going deeper into MainPass3dNode

Now that we know the high-level rendering stages (Extract through Cleanup) and how the rendering execution is traversed as a graph, we still don’t know much about how the core rendering pipelines work - the ones you get by default by including DefaultPlugins.

Looking through MainPass3dNode’s run method, we get reintroduced to RenderPhase (which we saw in the PhaseSort rendering stage). At a high level, the node generates calls that render all opaque objects, then all alpha masked objects (those where each texel is either fully opaque and thus rendered, or fully transparent and not rendered), and then all transparent objects.

For each of these, a RenderPhase (Opaque3d for the opaque phase) exists as a Component on the camera, and it has all the items (on the items attribute, implementing trait PhaseItem) that should be rendered.

The obvious question now is who actually populates these RenderPhases with the relevant draw requests?

The most direct answer is that the most likely case depends on the material - for 3D meshes, you’re likely using a Material (StandardMaterial, if you didn’t specify one) from the bevy_pbr crate, and the generic system queue_material_meshes is going to populate the items to be rendered for that Material. This system is added when you add MaterialPlugin for your material - or in the case of StandardMaterial, it is set up for you in PbrPlugin, which is part of DefaultPlugins (if you have the default feature set enabled). Which of the phases (Opaque3d, AlphaMask3d, or Transparent3d) each item is added to depends on the item’s material properties.

queue_material_meshes itself needs to know what meshes to include. It configures an ECS query, material_meshes, that looks for all meshes that have a handle of that specific material type (that implements the Material trait). In addition, it only considers those entities that are visible to in the view (think, “camera”), removing unnecessary drawing of objects that are, say, behind the camera.

However, like most things in Bevy, it’s possible to add your own system that adds your own items to a RenderPhase of a camera - the shader_instancing example does exactly that.

Visibility and RenderLayers

Visibility is a bit more complicated than I mentioned above - it’s not just “is the entity located in front of the camera”.

The Visibility Component allows your code to mark entities as visible or not. The standard bundles include this component (defaulting to being visible). You can then mark entities as not being visible by setting is_visible to false within this component.

This visibility setting follows the parent/child hierarchy - if a parent is set to not visible, all the child entities, even those whose is_visible is set to true, won’t be visible.

(There’s a not-intended-for-non-advanced-use ComputedVisibility Component as well - this will be updated by visibility systems (which you can provide yourself as well, of course) that run in the CoreStage::PostUpdate stage. You could use this to, say, lower the update frequency of entities that aren’t visible on any camera - you’d only know about the previous frame’s information, though…)

The real magic happens in the system labeled VisibilitySystems::CheckVisibility, which populated not only ComputedVisibility Components for all entities, but also populates the VisibleEntities Component with a list of likely-visible (not accounting for, say, obstructions) for each camera.

As part of this check, the Camera’s RenderLayers Component (if it exists) is checked against each Entity’s RenderLayers Component (if it exists). This is a “mask” of which 32 render layers that this particular camera is able to see, or which the entity is part of. By default, if not present, cameras can only see layer 0, and entities are part of layer 0.

This can allow you to set up multiple cameras that each can see subsets (possibly overlapping) entities. For example, one camera might be configured to 3D render something that shows up in some part of the UI (a character preview in an equipment screen?), and can only see entities relevant to that. Another might be generating a simplified top-down view. Another might be generating the view through a portal or what appears on a mirror.

Viewport and RenderTarget

Having every camera only able to render to the full size of the primary window is pretty limiting. Cameras can be configured to render to other windows (yes, Bevy can create more than one window), as well as to textures on the GPU - these are RenderTargets. These textures can then be referenced as either color textures for 2D sprites (for the character preview on an equipment screen case), or even used as an input to your own shaders. The render_to_texture example shows this, as well as RenderLayers.

Cameras can also be configured to use a Viewport (ie, a rectangular subset of the full size) rather than the full size. For example, if your game’s UI always covers some parts of a screen, there’s no reason to render objects visible to the camera that would be obscured by that UI. This could also be used for a four-panel display of a particular object - from the top, side, front, and a perspective view. The split_screen example shows side-by-side camera rendering using Viewports.

PhaseItem → RenderCommands

But how does the actual drawing happen on the items in the RenderPhases? That’s where PhaseItem comes in. It’s a really simple trait - from a drawing perspective it has a draw_function method that returns an identifier for a draw function- the draw functions themselves are stored in a DrawFunctions<P: PhaseItem> resource in the render app for later lookup.

Looking at how PhaseItem is implemented for the Opaque3d phase, the draw function is passed in on creation (you can, of course, do something different for your own). The bevy_pbr material system gets this draw function from the DrawMaterial for the particular material, which is defined as a tuple of RenderCommands.

RenderCommands are a starting to get very close to the wgpu implementation details. These are the individual steps that are composed together to achieve the drawing. As an example for DrawMaterial in bevy_pbr:

type DrawMaterial<M> = (
    SetItemPipeline,
    SetMeshViewBindGroup<0>,
    SetMaterialBindGroup<M, 1>,
    SetMeshBindGroup<2>,
    DrawMesh,
);

SetItemPipeline gets the item’s GPU drawing pipeline (the combination of shaders, bind group layouts, buffer slots, and so forth) - this is part of the bevy_render crate and likely reused across most implementations.

The rest are from the bevy_pbr crate:

SetMeshViewBindGroup collects all the view-related data related to the item (which camera, which light(s?)) and sets the configured the bind group at the configured index with that information.

SetMeshBindGroup collects mesh-specific data and similarly configured the bind group.

SetMaterialBindGroup does the same thing for material-related data.

Finally, DrawMesh actually does the draw calls, looking up the GPU-side mesh buffer, and if present, setting up the vertex buffer, and doing the call.

Shadow mapping for comparison

If you understand shadow mapping, you’ll know we need to generate a texture that represents the locations that a particular light (usually a DirectionalLight) sees (much like normal rendering is done from the perspective of a camera). During normal rendering, we consult this shadow map texture to determine if the location we’re rendering is one that the light sees. If so, this item gains the benefit of the light, otherwise it is in shadow for that light.

We want to find where the PhaseItems (which will specify the draw calls) are being added to a RenderPhase of some sort. We could try find the Node, the RenderPhase, or the RenderCommands involved to find a place to start. We could also start at a Plugin.

PbrPlugin helps a lot - since we need to initialize resources and/or sub-plugins to extract resources/components as well as initialize DrawFunctions per RenderPhase and register RenderCommands. This also happens to grab hold of the core 3d rendering pipelines RenderGraph per camera, and adds shadow_pass_node.

The analog to the core 3d pipeline’s queue_material_meshes in the shadow mapping world is queue_shadows. Instead of iterating through the cameras and the entities visible from the cameras, it iterates through the lights, and the entities visible from the lights. It only considers meshes on entities that don’t have the NotShadowCaster component on them.

The draw function is DrawShadowMesh, which is comprised of RenderCommands:

One other key difference is that the pipeline that is being set in the PhaseItem is different. Unlike the normal rendering pipeline, it only has a vertex shader configured, and it only uses a depth texture. That’s because shadow maps don’t care about what meshes look like, only their location, which comes from the depth (essentially, distance from light for this location in the light’s perspective).

This essentially means that we’re rendering a mesh with a different material than the entity specified. We don’t even need to render the mesh using the draw method (triangles) that is specified as part of the mesh. So we could draw the mesh wireframe. Which is exactly how the wireframe rendering in Bevy works.

Pipelines, shaders, bind groups, …

Much of how the Bevy rendering system works is a fairly direct mapping to the equivalently-named wgpu concepts. Bevy makes it easier to generate some of these (like the AsBindGroup derive macro mentioned below), but you’ll likely need to know how these work before too long. I’ve skipped all these (and the Bevy pipeline management systems that skip rendering when resources aren’t available yet) since there are way better places to learn about them - like Learn Wgpu, as well as the backend-specific resources, especially for Vulkan.

Loose ends

PhaseSort

Each of the RenderPhase’s have a preferred rendering order based on distance. For Opaque and AlphaMask phases, you want to render the closest objects first, as they will likely obscure objects behind them. For objects that are behind something that has already been rendered, we don’t pay the cost of calculating the fragment shader for them. Transparent phases need to be done from farthest away to closest, however, in order for transparency to be done in the right order. PhaseSort does this sorting, with the distances provided by whoever added the item to the RenderPhase.

(The RenderPhases themselves are added to the cameras as part of the Extract stage on the core rendering plugins.)

Extract

Since everything involved in rendering happens in a different ECS world from your normal code, and that rendering world is cleared every frame, everything that’s involved in rendering needs to be copied over. You don’t need to do this unless you’re changing the rendering - the cameras, meshes, and so forth are all handled for you. Any components and resources you add aren’t copied - why pay the cost if they’re not needed.

The components and resources aren’t necessarily just direct copies. If you want direct copies, there are derive macros ExtractComponent and ExtractResource for doing that, paired with the plugins ExtractComponentPlugin and ExtractResourcePlugin. You can also implement the traits ExtractComponent and ExtractResource to specify how to extract your components and resources into the render world.

What next?

With a rough idea of what’s going on in Bevy, there are a bunch of options now, but most of these require you to start taking a deeper dive into non-Bevy rendering topics first as I mentioned.

The easiest first step is probably writing your own fragment shader - the thing that decides what colour your material has. This doesn’t require creating your own pipeline yourself, since bevy_pbr crate’s MaterialPlugin and some other helpers is going to do that for you. There’s a simple example in the bevy repo - shader_material - coming it at ~65 lines of Rust-side code, and 17 lines of shader language. (This is less than half of what it used to be, thanks to some ergonomic attention to simplifying setting up buffers and bind groups for materials.)

With the background of how the rendering works end-to-end, many of the Bevy built-in examples are much more approachable now. The game of life example shows how relatively easy setting up compute shaders in Bevy are.

In addition, there are examples and full implementations that are worth looking at at other places: