Buffers initialization

note

The code described in this section can be run from the encase.rs example with cargo run --features derive --example encase. Complete source code is also provided at the end of this page.

wgcore exposes some utilities to ease the creation and initialization of wgpu::Buffer. All the structs mentioned below are actually aliases for the more general GpuTensor for a n-dimensional array of values:

GpuScalar: for a storage buffer with a single element.
GpuVector: for a 1-dimensional array of values.
GpuMatrix: for a 2-dimensional array of values.

Initializing webgpu buffers can be tricky as the buffer layouts must match the layout of the corresponding struct defined on the WGSL shader. To facilitate the conversion between rust structure and raw bytes, wgcore can leverage either bytemuck or encase.

Initialization with `bytemuck`

Initialization with bytemuck can be done using the ::init constructor of GpuScalar/GpuVector/GpuMatrix. Note that not all types are capable of implementing the bytemuck::Pod trait due to alignment and padding restrictions. If you can’t derive bytemuck::Pod for your own type, consider the solution based on encase.

If a type T implements the bytemuck::Pod trait, it can be passed to GpuScalar::init.
Any type implementing AsRef<[T]> (like Vec<T>, &[T], or DVector from nalgebra can be passed to GpuVector::init.
Any matrix type, parameterized by T, from the nalgebra crate can be passed to GpuMatrix::init.

#[derive(Copy, Clone, bytemuck::Pod, bytemuck::Zeroable)]
#[repr(C)]
pub struct BytemuckStruct {
    value: f32,
}

#[async_std::main]
async fn main() -> anyhow::Result<()> {
    let gpu = GpuInstance::new().await?;
    let data = (0..1000)
        .map(|x| BytemuckStruct { value: x as f32 })
        .collect::<Vec<_>>();
    let gpu_buffer = GpuVector::init(gpu.device(), &data, BufferUsages::STORAGE);
}

tip

Initialization based on bytemuck will always be more efficient than initializing with encase as it involves absolutely zero overhead.

Initialization with `encase`

If the structure cannot implement the bytemuck::Pod trait, it is possible to rely on the [encase::ShaderType] trait instead by calling the ::encase constructor of GpuScalar/GpuVector/GpuMatrix.

If a type T implements the encase ShaderType trait, it can be passed to GpuScalar::encase.
Any type implementing AsRef<[T]> (like Vec<T>, &[T], or DVector from nalgebra can be passed to GpuVector::encase.
Any matrix type, parameterized by T, from the nalgebra crate can be passed to GpuMatrix::encase.

#[derive(Copy, Clone, encase::ShaderType)]
#[repr(C)]
pub struct EncaseStruct {
    value: f32,
    // This implies some internal padding, so we can’t rely on bytemuck.
    // Encase will handle that properly.
    value2: Vector4<f32>,
}

#[async_std::main]
async fn main() -> anyhow::Result<()> {
    let gpu = GpuInstance::new().await?;
    let a_data = (0..LEN)
        .map(|x| EncaseStruct {
            value: x as f32,
            value2: Vector4::repeat(x as f32 * 10.0),
        })
    let gpu_buffer = GpuVector::encase(gpu.device(), &data, BufferUsages::STORAGE);
}

danger

The ::encase function will result in an extra allocation and a conversion of all elements in the provided vector or matrix. Because of this overhead, it is recommended to initialize with bytemuck whenever possible.

Complete example

main.rs
kernel.wgsl

#[cfg(not(feature = "derive"))]
std::compile_error!(
    r#"
    ###############################################################
    ## The `derive` feature must be enabled to run this example. ##
    ###############################################################
"#
);

use nalgebra::Vector4;
use wgcore::gpu::GpuInstance;
use wgcore::kernel::{KernelInvocationBuilder, KernelInvocationQueue};
use wgcore::tensor::GpuVector;
use wgcore::Shader;
use wgpu::{BufferUsages, ComputePipeline};

#[derive(Copy, Clone, PartialEq, Debug, Default, bytemuck::Pod, bytemuck::Zeroable)]
#[repr(C)]
pub struct BytemuckStruct {
    value: f32,
}

#[derive(Copy, Clone, PartialEq, Debug, Default, encase::ShaderType)]
#[repr(C)]
pub struct EncaseStruct {
    value: f32,
    // This implies some internal padding, so we can’t rely on bytemuck.
    // Encase will handle that properly.
    value2: Vector4<f32>,
}

#[derive(Shader)]
#[shader(src = "encase.wgsl", composable = false)]
struct ShaderEncase {
    main: ComputePipeline,
}

#[async_std::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the gpu device and its queue.
    //
    // Note that `GpuInstance` is just a simple helper struct for initializing the gpu resources.
    // You are free to initialize them independently if more control is needed, or reuse the ones
    // that were already created/owned by e.g., a game engine.
    let gpu = GpuInstance::new().await?;

    // Load and compile our kernel. The `from_device` function was generated by the `Shader` derive.
    // Note that its dependency to `Composable` is automatically resolved by the `Shader` derive
    // too.
    let kernel = ShaderEncase::from_device(gpu.device())?;

    // Create the buffers.
    const LEN: u32 = 1000;
    let a_data = (0..LEN)
        .map(|x| EncaseStruct {
            value: x as f32,
            value2: Vector4::repeat(x as f32 * 10.0),
        })
        .collect::<Vec<_>>();
    let b_data = (0..LEN)
        .map(|x| BytemuckStruct { value: x as f32 })
        .collect::<Vec<_>>();
    // Call `encase` instead of `init` because `EncaseStruct` isn’t `Pod`.
    // The `encase` function has a bit of overhead so bytemuck should be preferred whenever possible.
    let a_buf = GpuVector::encase(gpu.device(), &a_data, BufferUsages::STORAGE);
    let b_buf = GpuVector::init(gpu.device(), &b_data, BufferUsages::STORAGE);

    // Queue the operation.
    let mut queue = KernelInvocationQueue::new(gpu.device());
    KernelInvocationBuilder::new(&mut queue, &kernel.main)
        .bind0([a_buf.buffer(), b_buf.buffer()])
        .queue(LEN.div_ceil(64));

    // Encode & submit the operation to the gpu.
    let mut encoder = gpu.device().create_command_encoder(&Default::default());
    queue.encode(&mut encoder, None);
    gpu.queue().submit(Some(encoder.finish()));

    Ok(())
}

@group(0) @binding(0)
var<storage, read_write> a: array<EncaseStruct>;
@group(0) @binding(1)
var<storage, read> b: array<BytemuckStruct>;

struct BytemuckStruct {
    value: f32,
}

struct EncaseStruct {
    value: f32,
    value2: vec4<f32>
}

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) invocation_id: vec3<u32>) {
    let i = invocation_id.x;
    if i < arrayLength(&a) {
        a[i].value += b[i].value;
        a[i].value2 += vec4(b[i].value);
    }
}

Initialization with bytemuck​

Initialization with encase​

Complete example​

Initialization with `bytemuck`

Initialization with `encase`

Complete example