../
Exploring Bevy 0.8’s rendering process
Published: , updated:
Bevy’s rendering process isn’t particularly well-documented, and there aren’t great practical examples that show how it fits together either. As a learning exercise, and hopefully to help others, I decided to learn how it works and write what I discovered.
The first steps
Ideally, the first place to look would be the documentation. Unfortunately, the Bevy Book doesn’t say anything about the rendering process - or anything beyond the minimum necessary to get a Bevy example built. The Bevy Cheat Book has a lot more documented about Bevy in general, but not much in the rendering section, but introduces Rendering Stages, which at least gets us started.
Render Stages
We can see that RenderStage
is defined in the bevy_render
crate.
These rendering stages are the highest level steps taken during rendering. Extract
copies data that will be used during rendering (which will allow for the original data to be modified), Prepare
will do some processing of that data, Queue
will generate the GPU commands to be executed (draw calls), PhaseSort
will sort something called RenderPhases
, Render
will actually execute those GPU commands, and Cleanup
will perform post-render cleanup.
We can also see that the RenderStage
s are configured as part of the RenderPlugin
plugin. Plugins are where the systems that run during Bevy execution are defined, so we’re definitely in the right place.
Unlike other stages, these aren’t on the main Bevy “app”, but rather a separate “render_app”, which is then added as a sub-application of the main one.
These Render Stages are hooks which the other Bevy plugins (and your own code) can use to run their own code. There is a simple example of how the Extract
Render Stage can have a system added to it to copy time from the main app to the rendering app every frame.
The most likely interesting stages will be Render
or possibly Queue
, so let’s see which plugins add systems to these stages.
It looks like RenderPlugin
itself is the only plugin that hooks into Render
. It only calls two systems: PipelineCache::process_pipeline_queue_system
and render_system
.
PipelineCache
doesn’t have much documentation, but the code seems to be involved in ensuring that compute and render pipeline descriptors and the shaders they reference are configured on the GPU - not actually executing any rendering.
That leaves render_system
, which is where we hit the jackpot - RenderGraphRunner
does the heavy lifting based on the RenderGraph
.
RenderGraph
RenderGraph
has a good description of how it works in its source documentation. The executable logic (the actual draw calls) is in Node
s, and the dependencies between Node
s are stored as Edge
s. In addition, Node
s have input and output slots, allowing them to pass data between each other. Finally, there may be subgraphs.
RenderGraphRunner
is responsible for taking all the configuration in the RenderGraph
and executing it, which it does in run_graph
, which is called recursively for subgraphs.
The game of life example shows how to add a new node to the render graph.
The Node trait is implemented in a few places that help to understand how Bevy’s default rendering is implemented:
- MainPass2dNode in bevy_core_pipeline crate
- MainPass3dNode in bevy_core_pipeline crate
- ShadowPassNode in bevy_pbr crate (for generating shadow maps per light)
- UiPassNode in bevy_ui crate, for UI rendering
- CameraDriverNode in bevy_render crate’s camera code
This seems to make sense - you can imagine that you’d want something like this for a standard 3D game:
- First, generate shadow maps for any lights
- Then render the view from a 3D camera including any 3D meshes.
- Finally, draw the UI on top.
For a 2D game, you’d probably just have:
- Render the view from a 2D-style camera using 2D meshes (usually sprites on quads)
- Draw the UI on top.
So what is CameraDriverNode
? MainPass2dNode
and MainPass3dNode
seem to be doing all the work from the perspective of their cameras - why is there another camera node?
Those nodes aren’t actually being added to the primary render graph - the core_2d
and core_3d
libraries actually create their own render graphs (per camera), and add them (without being executed) to the primary render graph. I can be hard to see, but line 64 is where graph
(the primary render graph) is fetched, and it is only used once - to add the sub-graph, on line 80.
These subgraphs are executed in CameraDriverNode.
Getting practical (in theory) with RenderGraph
With this information, we can now speculate how to achieve some things.
For example, we could imagine a Bevy debugging UI would have their own node, and that will be added to the render graph with edges to the camera driver node so that they execute after it. And if you look at bevy_egui’s setup code, that’s exactly how it works. This ensures the debugging UI will always appear above the standard rendered content, and in more complicated setups, doesn’t end up being rendered to some alternate target besides the primary window.
We’ve already seen an example (game_of_life) of adding a Node to the primary render graph to perform some computation that can subsequently be rendered as a sprite for the core 2D rendering pipeline.
You can imagine some post-processing effects (say, greying out or blurring out the screen during pause menu) would be implemented as a Node that executes after the core rendering pipeline and uses their results.
You can even insert steps into the core 2D/3D rendering pipelines. Using RenderGraph
’s get_sub_graph_mut
method, you can grab and insert new nodes into those subgraphs. That’s how the shadow maps are inserted for the PBR materials - the shadow maps need to be generated for use by the PBR shaders.
Going deeper into MainPass3dNode
Now that we know the high-level rendering stages (Extract through Cleanup) and how the rendering execution is traversed as a graph, we still don’t know much about how the core rendering pipelines work - the ones you get by default by including DefaultPlugins
.
Looking through MainPass3dNode
’s run
method, we get reintroduced to RenderPhase
(which we saw in the PhaseSort
rendering stage). At a high level, the node generates calls that render all opaque objects, then all alpha masked objects (those where each texel is either fully opaque and thus rendered, or fully transparent and not rendered), and then all transparent objects.
For each of these, a RenderPhase
(Opaque3d
for the opaque phase) exists as a Component
on the camera, and it has all the items (on the items
attribute, implementing trait PhaseItem
) that should be rendered.
The obvious question now is who actually populates these RenderPhase
s with the relevant draw requests?
The most direct answer is that the most likely case depends on the material - for 3D meshes, you’re likely using a Material
(StandardMaterial
, if you didn’t specify one) from the bevy_pbr
crate, and the generic system queue_material_meshes is going to populate the items to be rendered for that Material. This system is added when you add MaterialPlugin
for your material - or in the case of StandardMaterial
, it is set up for you in PbrPlugin
, which is part of DefaultPlugins
(if you have the default feature set enabled). Which of the phases (Opaque3d
, AlphaMask3d
, or Transparent3d
) each item is added to depends on the item’s material properties.
queue_material_meshes
itself needs to know what meshes to include. It configures an ECS query, material_meshes
, that looks for all meshes that have a handle of that specific material type (that implements the Material
trait). In addition, it only considers those entities that are visible to in the view (think, “camera”), removing unnecessary drawing of objects that are, say, behind the camera.
However, like most things in Bevy, it’s possible to add your own system that adds your own items to a RenderPhase
of a camera - the shader_instancing example does exactly that.
Visibility and RenderLayers
Visibility is a bit more complicated than I mentioned above - it’s not just “is the entity located in front of the camera”.
The Visibility
Component allows your code to mark entities as visible or not. The standard bundles include this component (defaulting to being visible). You can then mark entities as not being visible by setting is_visible
to false within this component.
This visibility setting follows the parent/child hierarchy - if a parent is set to not visible, all the child entities, even those whose is_visible
is set to true, won’t be visible.
(There’s a not-intended-for-non-advanced-use ComputedVisibility
Component as well - this will be updated by visibility systems (which you can provide yourself as well, of course) that run in the CoreStage::PostUpdate
stage. You could use this to, say, lower the update frequency of entities that aren’t visible on any camera - you’d only know about the previous frame’s information, though…)
The real magic happens in the system labeled VisibilitySystems::CheckVisibility
, which populated not only ComputedVisibility
Components for all entities, but also populates the VisibleEntities
Component with a list of likely-visible (not accounting for, say, obstructions) for each camera.
As part of this check, the Camera’s RenderLayers
Component (if it exists) is checked against each Entity’s RenderLayers
Component (if it exists). This is a “mask” of which 32 render layers that this particular camera is able to see, or which the entity is part of. By default, if not present, cameras can only see layer 0, and entities are part of layer 0.
This can allow you to set up multiple cameras that each can see subsets (possibly overlapping) entities. For example, one camera might be configured to 3D render something that shows up in some part of the UI (a character preview in an equipment screen?), and can only see entities relevant to that. Another might be generating a simplified top-down view. Another might be generating the view through a portal or what appears on a mirror.
Viewport and RenderTarget
Having every camera only able to render to the full size of the primary window is pretty limiting. Cameras can be configured to render to other windows (yes, Bevy can create more than one window), as well as to textures on the GPU - these are RenderTarget
s. These textures can then be referenced as either color textures for 2D sprites (for the character preview on an equipment screen case), or even used as an input to your own shaders. The render_to_texture example shows this, as well as RenderLayers
.
Cameras can also be configured to use a Viewport
(ie, a rectangular subset of the full size) rather than the full size. For example, if your game’s UI always covers some parts of a screen, there’s no reason to render objects visible to the camera that would be obscured by that UI. This could also be used for a four-panel display of a particular object - from the top, side, front, and a perspective view. The split_screen example shows side-by-side camera rendering using Viewport
s.
PhaseItem → RenderCommands
But how does the actual drawing happen on the items in the RenderPhase
s? That’s where PhaseItem
comes in. It’s a really simple trait - from a drawing perspective it has a draw_function
method that returns an identifier for a draw function- the draw functions themselves are stored in a DrawFunctions<P: PhaseItem>
resource in the render app for later lookup.
Looking at how PhaseItem
is implemented for the Opaque3d phase, the draw function is passed in on creation (you can, of course, do something different for your own). The bevy_pbr
material system gets this draw function from the DrawMaterial
for the particular material, which is defined as a tuple of RenderCommand
s.
RenderCommand
s are a starting to get very close to the wgpu implementation details. These are the individual steps that are composed together to achieve the drawing. As an example for DrawMaterial
in bevy_pbr
:
type DrawMaterial<M> = (
SetItemPipeline,
SetMeshViewBindGroup<0>,
SetMaterialBindGroup<M, 1>,
SetMeshBindGroup<2>,
DrawMesh,
);
SetItemPipeline
gets the item’s GPU drawing pipeline (the combination of shaders, bind group layouts, buffer slots, and so forth) - this is part of the bevy_render
crate and likely reused across most implementations.
The rest are from the bevy_pbr
crate:
SetMeshViewBindGroup
collects all the view-related data related to the item (which camera, which light(s?)) and sets the configured the bind group at the configured index with that information.
SetMeshBindGroup
collects mesh-specific data and similarly configured the bind group.
SetMaterialBindGroup
does the same thing for material-related data.
Finally, DrawMesh
actually does the draw calls, looking up the GPU-side mesh buffer, and if present, setting up the vertex buffer, and doing the call.
Shadow mapping for comparison
If you understand shadow mapping, you’ll know we need to generate a texture that represents the locations that a particular light (usually a DirectionalLight
) sees (much like normal rendering is done from the perspective of a camera). During normal rendering, we consult this shadow map texture to determine if the location we’re rendering is one that the light sees. If so, this item gains the benefit of the light, otherwise it is in shadow for that light.
We want to find where the PhaseItems
(which will specify the draw calls) are being added to a RenderPhase
of some sort. We could try find the Node
, the RenderPhase
, or the RenderCommands
involved to find a place to start. We could also start at a Plugin
.
PbrPlugin
helps a lot - since we need to initialize resources and/or sub-plugins to extract resources/components as well as initialize DrawFunctions
per RenderPhase
and register RenderCommand
s. This also happens to grab hold of the core 3d rendering pipelines RenderGraph
per camera, and adds shadow_pass_node
.
The analog to the core 3d pipeline’s queue_material_meshes
in the shadow mapping world is queue_shadows. Instead of iterating through the cameras and the entities visible from the cameras, it iterates through the lights, and the entities visible from the lights. It only considers meshes on entities that don’t have the NotShadowCaster
component on them.
The draw function is DrawShadowMesh
, which is comprised of RenderCommand
s:
- SetItemPipeline (the same one from
bevy_render
the normal pipeline uses) - SetShadowViewBindGroup (things specific to the light we’re rendering from)
- SetMeshBindGroup (the same one the normal pipeline uses)
- DrawMesh (the same one the normal pipeline uses)
One other key difference is that the pipeline that is being set in the PhaseItem
is different. Unlike the normal rendering pipeline, it only has a vertex shader configured, and it only uses a depth texture. That’s because shadow maps don’t care about what meshes look like, only their location, which comes from the depth (essentially, distance from light for this location in the light’s perspective).
This essentially means that we’re rendering a mesh with a different material than the entity specified. We don’t even need to render the mesh using the draw method (triangles) that is specified as part of the mesh. So we could draw the mesh wireframe. Which is exactly how the wireframe rendering in Bevy works.
Pipelines, shaders, bind groups, …
Much of how the Bevy rendering system works is a fairly direct mapping to the equivalently-named wgpu concepts. Bevy makes it easier to generate some of these (like the AsBindGroup derive macro mentioned below), but you’ll likely need to know how these work before too long. I’ve skipped all these (and the Bevy pipeline management systems that skip rendering when resources aren’t available yet) since there are way better places to learn about them - like Learn Wgpu, as well as the backend-specific resources, especially for Vulkan.
Loose ends
PhaseSort
Each of the RenderPhase
’s have a preferred rendering order based on distance. For Opaque and AlphaMask phases, you want to render the closest objects first, as they will likely obscure objects behind them. For objects that are behind something that has already been rendered, we don’t pay the cost of calculating the fragment shader for them. Transparent phases need to be done from farthest away to closest, however, in order for transparency to be done in the right order. PhaseSort
does this sorting, with the distances provided by whoever added the item to the RenderPhase
.
(The RenderPhase
s themselves are added to the cameras as part of the Extract
stage on the core rendering plugins.)
Extract
Since everything involved in rendering happens in a different ECS world from your normal code, and that rendering world is cleared every frame, everything that’s involved in rendering needs to be copied over. You don’t need to do this unless you’re changing the rendering - the cameras, meshes, and so forth are all handled for you. Any components and resources you add aren’t copied - why pay the cost if they’re not needed.
The components and resources aren’t necessarily just direct copies. If you want direct copies, there are derive macros ExtractComponent
and ExtractResource
for doing that, paired with the plugins ExtractComponentPlugin
and ExtractResourcePlugin
. You can also implement the traits ExtractComponent
and ExtractResource
to specify how to extract your components and resources into the render world.
What next?
With a rough idea of what’s going on in Bevy, there are a bunch of options now, but most of these require you to start taking a deeper dive into non-Bevy rendering topics first as I mentioned.
The easiest first step is probably writing your own fragment shader - the thing that decides what colour your material has. This doesn’t require creating your own pipeline yourself, since bevy_pbr
crate’s MaterialPlugin
and some other helpers is going to do that for you. There’s a simple example in the bevy repo - shader_material - coming it at ~65 lines of Rust-side code, and 17 lines of shader language. (This is less than half of what it used to be, thanks to some ergonomic attention to simplifying setting up buffers and bind groups for materials.)
With the background of how the rendering works end-to-end, many of the Bevy built-in examples are much more approachable now. The game of life example shows how relatively easy setting up compute shaders in Bevy are.
In addition, there are examples and full implementations that are worth looking at at other places:
- …