Skip to main content

Shaders composition

note

The code described in this section can be run from the compose.rs example with cargo run --features derive --example compose. Complete source code is also provided at the end of this page.

The main feature of wgcore is the provide the ability share and compose WGSL shaders easily across crates. Roughly speaking, wgcore exposes a trait and derive macro for generating the boilerplate needed by naga-oil for shader composition.

This is achieved by mapping one shader (one .wgsl file) to one rust structure. That rust structure can then be used for creating any compute kernel present in the shader and/or for importing its content from another shader. Furthermore since it’s a regular rust struct, it can be exported and used across crates easily.

The Shader trait

Say you have two .wgsl shaders, one dependency.wgsl exposing a wgsl structure and a mathematical function, and the other kernel.wgsl defines a compute pipeline entrypoint that calls functions from dependency.wgsl. The dependency shader contains a nana-oil #define_import_path statement indicating it is intended to be imported from other shaders. The kernel shaders contains an #import statement indicating it requires symbols exported by the dependency shader:

#import composable::module as Dependency

@group(0) @binding(0)
var<storage, read_write> a: array<Dependency::MyStruct>;
@group(0) @binding(1)
var<storage, read> b: array<Dependency::MyStruct>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) invocation_id: vec3<u32>) {
let i = invocation_id.x;
if i < arrayLength(&a) {
a[i] = Dependency::shared_function(a[i], b[i]);
}
}

On the rust side, a structure can be associated to each shader by deriving the Shader trait. For dependency.wgsl, the derive is fairly straightforward. Note that the path to the .wgsl file (relative to the .rs source file) needs to be provided.

#[derive(Shader)]
#[shader(
src = "dependency.wgsl"
)]
struct Composable;
info

The shader file indicated with shader(src = ...) will be embedded in the crate/executable so there is no need for any additional action to package the .wgsl asset file.

For the shader containing the compute pipeline entrypoint, a few more attributes are needed:

  • shader(derive(Composable)) indicates that our kernel.wgsl shader depends on the source code of the shader associated to the Composable structure. All the naga_oil boilerplace to compose the .wgsl modules will be generated automatically. Note that multiple dependencies can be specified when needed, e.g., derive(Composable1, Composable2).
  • shader(composable = false) indicates that WgKernel cannot be reused as a dependency of another shader. In practice this is required if the .wgsl file does not contain a #define_import_path line.
#[derive(Shader)]
#[shader(
derive(Composable), // This shader depends on the `Composable` shader.
src = "kernel.wgsl", // Shader source code, will be embedded in the exe with `include_str!`.
composable = false // This shader doesn’t export any symbols reusable from other wgsl shaders.
)]
struct WgKernel {
main: ComputePipeline,
}

Our compute shader also defines an entrypoint for a compute pipeline. Here, the entrypoint is named main in the .wgsl sources. By specifying a field main: ComputePipeline in the rust struct, the Shader derive will automatically generate the boilerplate to instantiate this compute pipeline with WgKernel::from_device(wgpu_device).

danger

If a .wgsl file does not contain a #define_import_path line, the composable = false attribute must be added to avoid a compile-time error.

With these associations between Rust structs and .wgsl shaders in place, we can now instantiate the compute pipeline easily without any additional boilerplace:

#[async_std::main]
async fn main() -> anyhow::Result<()> {
let gpu = GpuInstance::new().await?; // Helper to initialize the GPU device and queue.
let kernel = WgKernel::from_device(gpu.device())?; // Initialize the compute pipeline
println!("{}", WgKernel::flat_wgsl()?); // Prints the source code of your kernel after all dependency resolution.
}

This prints the following "flattened" shader. This is what is actually being compiled by naga:

struct MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX {
value: f32,
}

@group(0) @binding(0)
var<storage, read_write> a_1: array<MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX>;
@group(0) @binding(1)
var<storage> b_1: array<MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX>;

fn shared_functionX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX(a: MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX, b: MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX) -> MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX {
return MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX((a.value + b.value));
}

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) invocation_id: vec3<u32>) {
let i: u32 = invocation_id.x;
if (i < arrayLength((&a_1))) {
let _e9: MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX = a_1[i];
let _e12: MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX = b_1[i];
let _e13: MyStructX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX = shared_functionX_naga_oil_mod_XMNXW24DPONQWE3DFHI5G233EOVWGKX(_e9, _e12);
a_1[i] = _e13;
return;
} else {
return;
}
}
info

Your output will differ from this one due to various factors like different UUID generation and different versions of naga, naga_oil, wgpu, and wgcore.

Running the kernel

In addition to initializing the compute pipeline, wgcore exports some convenient utilities for actually running it. First, the input buffers can be initialized easily using the GpuScalar, GpuVector, GpuMatrix wrappers (the all initialize and contain a regular wgpu Buffer).

let a = GpuVector::init(gpu.device(), &[1.0, 2.0, 3.0], BufferUsages::STORAGE);
let b = GpuVector::init(gpu.device(), &[4.0, 5.0, 6.0], BufferUsages::STORAGE);

Then we can bind and queue the compute pipeline with these inputs using the KernelInvocationBuilder:

let mut queue = KernelInvocationQueue::new(gpu.device());
KernelInvocationBuilder::new(&mut queue, &kernel.main)
.bind0([a.buffer(), b.buffer()]) // Bind the buffers in group 0.
.queue(LEN.div_ceil(64)); // Divide by 64, the workgroup size.

Finally, we encode the queue and submit the calculation to the gpu:

let mut encoder = gpu.device().create_command_encoder(&Default::default());
queue.encode(&mut encoder, None);
gpu.queue().submit(Some(encoder.finish()));
tip

The complete example below also includes some elements to read the result back to RAM and prints it to the console.

Complete example

#[cfg(not(feature = "derive"))]
std::compile_error!(
r#"
###############################################################
## The `derive` feature must be enabled to run this example. ##
###############################################################
"#
);

use nalgebra::DVector;
use std::fmt::Debug;
use wgcore::gpu::GpuInstance;
use wgcore::kernel::{KernelInvocationBuilder, KernelInvocationQueue};
use wgcore::tensor::GpuVector;
use wgcore::Shader;
use wgpu::{BufferUsages, ComputePipeline};

// Declare our shader module that contains our composable functions.
// Note that we don’t build any compute pipeline from this wgsl file.
#[derive(Shader)]
#[shader(
src = "compose_dependency.wgsl" // Shader source code, will be embedded in the exe with `include_str!`
)]
struct Composable;

#[derive(Shader)]
#[shader(
derive(Composable), // This shader depends on the `Composable` shader.
src = "compose_kernel.wgsl", // Shader source code, will be embedded in the exe with `include_str!`.
composable = false // This shader doesn’t export any symbols reusable from other wgsl shaders.
)]
struct WgKernel {
// This ComputePipeline field indicates that the Shader macro needs to generate the boilerplate
// for loading the compute pipeline in `WgKernel::from_device`.
main: ComputePipeline,
}

#[derive(Copy, Clone, PartialEq, Default, bytemuck::Pod, bytemuck::Zeroable)]
#[repr(C)]
pub struct MyStruct {
value: f32,
}

// Optional: makes the debug output more concise.
impl Debug for MyStruct {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{}", self.value)
}
}

#[async_std::main]
async fn main() -> anyhow::Result<()> {
// Initialize the gpu device and its queue.
//
// Note that `GpuInstance` is just a simple helper struct for initializing the gpu resources.
// You are free to initialize them independently if more control is needed, or reuse the ones
// that were already created/owned by e.g., a game engine.
let gpu = GpuInstance::new().await?;

// Load and compile our kernel. The `from_device` function was generated by the `Shader` derive.
// Note that its dependency to `Composable` is automatically resolved by the `Shader` derive
// too.
let kernel = WgKernel::from_device(gpu.device())?;
println!("######################################");
println!("###### Composed shader sources: ######");
println!("######################################");
println!("{}", WgKernel::flat_wgsl()?);

// Now, let’s actually run our kernel.
let result = run_kernel(&gpu, &kernel).await;
println!("Result: {:?}", result);

Ok(())
}

async fn run_kernel(gpu: &GpuInstance, kernel: &WgKernel) -> Vec<MyStruct> {
// Create the buffers.
const LEN: u32 = 10;
let a_data = DVector::from_fn(LEN as usize, |i, _| MyStruct { value: i as f32 });
let b_data = DVector::from_fn(LEN as usize, |i, _| MyStruct {
value: i as f32 * 10.0,
});
let a_buf = GpuVector::init(
gpu.device(),
&a_data,
BufferUsages::STORAGE | BufferUsages::COPY_SRC,
);
let b_buf = GpuVector::init(gpu.device(), &b_data, BufferUsages::STORAGE);
let staging = GpuVector::uninit(
gpu.device(),
LEN,
BufferUsages::COPY_DST | BufferUsages::MAP_READ,
);

// Queue the operation.
let mut queue = KernelInvocationQueue::new(gpu.device());
KernelInvocationBuilder::new(&mut queue, &kernel.main)
.bind0([a_buf.buffer(), b_buf.buffer()])
.queue(LEN.div_ceil(64));

// Encode & submit the operation to the gpu.
let mut encoder = gpu.device().create_command_encoder(&Default::default());
queue.encode(&mut encoder, None);
// Copy the result to the staging buffer.
staging.copy_from(&mut encoder, &a_buf);
gpu.queue().submit(Some(encoder.finish()));

// Read the result back from the gpu.
staging
.read(gpu.device())
.await
.expect("Failed to read result from the GPU.")
}