Buffers initialization
The code described in this section can be run from the
encase.rs example with
cargo run --features derive --example encase
. Complete source code is also provided at the
end of this page.
wgcore
exposes some utilities to ease the creation and initialization of wgpu::Buffer
. All the structs mentioned
below are actually aliases for the more general GpuTensor
for a n-dimensional array of values:
GpuScalar
: for a storage buffer with a single element.GpuVector
: for a 1-dimensional array of values.GpuMatrix
: for a 2-dimensional array of values.
Initializing webgpu buffers can be tricky as the buffer layouts must match the layout of the corresponding struct
defined on the WGSL shader. To facilitate the conversion between rust structure and raw bytes, wgcore
can leverage
either bytemuck
or encase
.
Initialization with bytemuck
Initialization with bytemuck
can be done using the ::init
constructor of GpuScalar/GpuVector/GpuMatrix
.
Note that not all types are capable of implementing the bytemuck::Pod
trait due to alignment and padding restrictions.
If you can’t derive bytemuck::Pod
for your own type, consider the solution based on encase
.
- If a type
T
implements thebytemuck::Pod
trait, it can be passed toGpuScalar::init
. - Any type implementing
AsRef<[T]>
(likeVec<T>
,&[T]
, orDVector
fromnalgebra
can be passed toGpuVector::init
. - Any matrix type, parametrized by
T
, from thenalgebra
crate can be passed toGpuMatrix::init
.
#[derive(Copy, Clone, bytemuck::Pod, bytemuck::Zeroable)]
#[repr(C)]
pub struct BytemuckStruct {
value: f32,
}
#[async_std::main]
async fn main() -> anyhow::Result<()> {
let gpu = GpuInstance::new().await?;
let data = (0..1000)
.map(|x| BytemuckStruct { value: x as f32 })
.collect::<Vec<_>>();
let gpu_buffer = GpuVector::init(gpu.device(), &data, BufferUsages::STORAGE);
}
Initialization based on bytemuck
will always be more efficient than initializing with encase
as it involves absolutely zero overhead.
Initialization with encase
If the structure cannot implement the bytemuck::Pod
trait, it is possible to rely on the [encase::ShaderType
] trait
instead by calling the ::encase
constructor of GpuScalar/GpuVector/GpuMatrix
.
- If a type
T
implements theencase
ShaderType
trait, it can be passed toGpuScalar::encase
. - Any type implementing
AsRef<[T]>
(likeVec<T>
,&[T]
, orDVector
fromnalgebra
can be passed toGpuVector::encase
. - Any matrix type, parametrized by
T
, from thenalgebra
crate can be passed toGpuMatrix::encase
.
#[derive(Copy, Clone, encase::ShaderType)]
#[repr(C)]
pub struct EncaseStruct {
value: f32,
// This implies some internal padding, so we can’t rely on bytemuck.
// Encase will handle that properly.
value2: Vector4<f32>,
}
#[async_std::main]
async fn main() -> anyhow::Result<()> {
let gpu = GpuInstance::new().await?;
let a_data = (0..LEN)
.map(|x| EncaseStruct {
value: x as f32,
value2: Vector4::repeat(x as f32 * 10.0),
})
let gpu_buffer = GpuVector::encase(gpu.device(), &data, BufferUsages::STORAGE);
}
The ::encase
function will result in an extra allocation and a conversion of all elements in the provided vector or
matrix. Because of this overhead, it is recommended to initialize with bytemuck
whenever possible.
Complete example
- main.rs
- kernel.wgsl
#[cfg(not(feature = "derive"))]
std::compile_error!(
r#"
###############################################################
## The `derive` feature must be enabled to run this example. ##
###############################################################
"#
);
use nalgebra::Vector4;
use wgcore::gpu::GpuInstance;
use wgcore::kernel::{KernelInvocationBuilder, KernelInvocationQueue};
use wgcore::tensor::GpuVector;
use wgcore::Shader;
use wgpu::{BufferUsages, ComputePipeline};
#[derive(Copy, Clone, PartialEq, Debug, Default, bytemuck::Pod, bytemuck::Zeroable)]
#[repr(C)]
pub struct BytemuckStruct {
value: f32,
}
#[derive(Copy, Clone, PartialEq, Debug, Default, encase::ShaderType)]
#[repr(C)]
pub struct EncaseStruct {
value: f32,
// This implies some internal padding, so we can’t rely on bytemuck.
// Encase will handle that properly.
value2: Vector4<f32>,
}
#[derive(Shader)]
#[shader(src = "encase.wgsl", composable = false)]
struct ShaderEncase {
main: ComputePipeline,
}
#[async_std::main]
async fn main() -> anyhow::Result<()> {
// Initialize the gpu device and its queue.
//
// Note that `GpuInstance` is just a simple helper struct for initializing the gpu resources.
// You are free to initialize them independently if more control is needed, or reuse the ones
// that were already created/owned by e.g., a game engine.
let gpu = GpuInstance::new().await?;
// Load and compile our kernel. The `from_device` function was generated by the `Shader` derive.
// Note that its dependency to `Composable` is automatically resolved by the `Shader` derive
// too.
let kernel = ShaderEncase::from_device(gpu.device())?;
// Create the buffers.
const LEN: u32 = 1000;
let a_data = (0..LEN)
.map(|x| EncaseStruct {
value: x as f32,
value2: Vector4::repeat(x as f32 * 10.0),
})
.collect::<Vec<_>>();
let b_data = (0..LEN)
.map(|x| BytemuckStruct { value: x as f32 })
.collect::<Vec<_>>();
// Call `encase` instead of `init` because `EncaseStruct` isn’t `Pod`.
// The `encase` function has a bit of overhead so bytemuck should be preferred whenever possible.
let a_buf = GpuVector::encase(gpu.device(), &a_data, BufferUsages::STORAGE);
let b_buf = GpuVector::init(gpu.device(), &b_data, BufferUsages::STORAGE);
// Queue the operation.
let mut queue = KernelInvocationQueue::new(gpu.device());
KernelInvocationBuilder::new(&mut queue, &kernel.main)
.bind0([a_buf.buffer(), b_buf.buffer()])
.queue(LEN.div_ceil(64));
// Encode & submit the operation to the gpu.
let mut encoder = gpu.device().create_command_encoder(&Default::default());
queue.encode(&mut encoder, None);
gpu.queue().submit(Some(encoder.finish()));
Ok(())
}
@group(0) @binding(0)
var<storage, read_write> a: array<EncaseStruct>;
@group(0) @binding(1)
var<storage, read> b: array<BytemuckStruct>;
struct BytemuckStruct {
value: f32,
}
struct EncaseStruct {
value: f32,
value2: vec4<f32>
}
@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) invocation_id: vec3<u32>) {
let i = invocation_id.x;
if i < arrayLength(&a) {
a[i].value += b[i].value;
a[i].value2 += vec4(b[i].value);
}
}