Timestamp queries

note

The code described in this section can be run from the timestamp_queries.rs example with cargo run --features derive --example timestamp_queries. Complete source code is also provided at the end of this page.

Unstable feature

Timestamp queries are a bit impractical to use at the moment. Their API is still being worked on.

Timestamp queries aim to answer the question: "How long did this compute operation run on the GPU?" It is essentially a basic, occasionally unreliable, benchmarking tools that measure shader runtimes. This is achieved by:

Instructing the GPU to insert timestamps at specific locations of the execution pipeline.
Instructing the GPU to resolve the queries.
Reading back the timestamp values from GPU memory to CPU land.
Converting the timestamps to milliseconds and subtracting them to obtain the runtimes.

Platform support for timestamp queries can be very limited so it is advised to not abort the execution of your application if they are not supported.

info

Be sure to enable the wgpu::Features::TIMESTAMP_QUERY feature when initializing your gpu device (in DeviceDescriptor::required_features).

The wgcore crate exposes the GpuTimestamps structure that helps with all the four steps mentioned above:

// Initialize a query set that can contain up to 2 timestamps
// (here we need only 2 timestamps since we are measuring a single compute pass).
let mut timestamps = GpuTimestamps::new(gpu.device(), 2);

// Initialize the invocation queue as usual.
let mut queue = KernelInvocationQueue::new(gpu.device());

// ## Step 1 ##
// Start a compute pass with a custom name. The `true` indicates that timestamps
// will have to be pushed for this pass.
queue.compute_pass("timestamp_queries_test", true);

// TODO: queue one or multiple kernels here.

let mut encoder = gpu.device().create_command_encoder(&Default::default());

// ## STEP 1 (bis) ##
// When encoding the queue, pass the `GpuTimestamps` to actually create our timestamps.
// Note that if `None` is given instead, timestamps will be ignored.
queue.encode(&mut encoder, Some(&mut timestamps));

// ## STEP 2 ##
// Resolve the timestamp queries.
timestamps.resolve(&mut encoder);

// Submit the calculations.
gpu.queue().submit(Some(encoder.finish()));

// ## STEP 3 & 4 ##
// Read back the timestamp values, in milliseconds.
// The compute pass’s runtime is given by `timestamps_read[1] - timestamps_read[2]`.
let timestamps_read = timestamps.wait_for_results_ms(gpu.device(), gpu.queue());

Each compute pass requires 2 timestamps for measuring their runtime: one for when the pass starts, and one for when its ends. The actual runtime of the compute pass is the difference between the two.

Note that there is currently no way to know which timestamp is related to which compute pass unless you know exactly in which order all the calls to queue.compute_pass happened: their timestamps will be in the same order.

note

The examples in this page only work on a native program. On WASM, reading the timestamps from the gpu can be a bit more convoluted due the async nature of web platforms. See how it is being done in the wgsparkl crate as an example.

Complete example

main.rs
kernel.wgsl

#[cfg(not(feature = "derive"))]
std::compile_error!(
    r#"
    ###############################################################
    ## The `derive` feature must be enabled to run this example. ##
    ###############################################################
"#
);

use nalgebra::{DVector, Vector4};
use std::ops::Div;
use wgcore::composer::ComposerExt;
use wgcore::gpu::GpuInstance;
use wgcore::hot_reloading::HotReloadState;
use wgcore::kernel::{KernelInvocationBuilder, KernelInvocationQueue};
use wgcore::tensor::{GpuScalar, GpuVector};
use wgcore::timestamps::GpuTimestamps;
use wgcore::Shader;
use wgpu::{BufferUsages, ComputePipeline};

#[derive(Shader)]
#[shader(src = "timestamp_queries.wgsl", composable = false)]
struct ShaderTimestampQueries {
    main: ComputePipeline,
}

#[async_std::main]
async fn main() -> anyhow::Result<()> {
    // Initialize the gpu device and its queue.
    //
    // Note that `GpuInstance` is just a simple helper struct for initializing the gpu resources.
    // You are free to initialize them independently if more control is needed, or reuse the ones
    // that were already created/owned by e.g., a game engine.
    let gpu = GpuInstance::new().await?;

    // Load and compile our kernel. The `from_device` function was generated by the `Shader` derive.
    // Note that its dependency to `Composable` is automatically resolved by the `Shader` derive
    // too.
    let mut kernel = ShaderTimestampQueries::from_device(gpu.device())?;

    // Create the buffers.
    const LEN: u32 = 2_000_000;
    let buffer = GpuVector::init(
        gpu.device(),
        vec![0u32; LEN as usize],
        BufferUsages::STORAGE | BufferUsages::COPY_SRC,
    );

    // Init hot-reloading.
    // We are setting up hot-reloading so that we can change somme elements in the shader
    // (like the iteration count) and see how that affects performances live.
    let mut hot_reload = HotReloadState::new()?;
    ShaderTimestampQueries::watch_sources(&mut hot_reload)?;

    // Init timestamp queries.
    // To measure the time of one kernel, we need two timestamps (one for when it starts and one for
    // when it stopped).
    let mut timestamps = GpuTimestamps::new(gpu.device(), 2);

    // Queue the operation.
    println!("#############################");
    println!("Edit the file `timestamp_queries.wgsl` (for example by multiplying or dividing NUM_ITERS by 10).\nThe updated runtime will be printed below whenever a change is detected.");
    println!("#############################");

    for loop_id in 0.. {
        // Detect & apply changes.
        hot_reload.update_changes();
        match kernel.reload_if_changed(gpu.device(), &hot_reload) {
            Ok(changed) => {
                if changed {
                    // Clear the timestamps to reuse in the next loop.
                    timestamps.clear();
                    // We detected a change (or this is the first loop).
                    // Read the result.
                    let mut queue = KernelInvocationQueue::new(gpu.device());
                    // Declare a compute pass with timestamps enabled.
                    queue.compute_pass("timestamp_queries_test", true);
                    KernelInvocationBuilder::new(&mut queue, &kernel.main)
                        .bind0([buffer.buffer()])
                        .queue(LEN.div_ceil(64));

                    // Encode & submit the operation to the gpu.
                    let mut encoder = gpu.device().create_command_encoder(&Default::default());
                    // Run our kernel.
                    queue.encode(&mut encoder, Some(&mut timestamps));
                    // Resolve the timestamp queries.
                    timestamps.resolve(&mut encoder);
                    gpu.queue().submit(Some(encoder.finish()));

                    // Read and print the kernel’s runtime.
                    let timestamps_read = timestamps.wait_for_results_ms(gpu.device(), gpu.queue());
                    println!(
                        "Current run time: {}ms",
                        timestamps_read[1] - timestamps_read[0]
                    );
                }
            }
            Err(e) => {
                // Hot-reloading failed, likely due to a syntax error in the shader.
                println!("Hot reloading error: {:?}", e);
            }
        }
    }

    Ok(())
}

@group(0) @binding(0)
var<storage, read_write> a: array<u32>;

@compute @workgroup_size(64, 1, 1)
fn main(@builtin(global_invocation_id) invocation_id: vec3<u32>) {
    let i = invocation_id.x;
    if i < arrayLength(&a) {
        const NUM_ITERS: u32 = 10000u;
        for (var k = 0u; k < NUM_ITERS; k++) {
            a[i] = collatz_iterations(a[i] * 7919);
        }
    }
}

// This is taken from the wgpu "hello_compute" example:
// https://github.com/gfx-rs/wgpu/blob/6f5014f0a3441bcbc3eb4223aee454b95904b087/examples/src/hello_compute/shader.wgsl
// (Apache 2 / MIT license)
//
// The Collatz Conjecture states that for any integer n:
// If n is even, n = n/2
// If n is odd, n = 3n+1
// And repeat this process for each new n, you will always eventually reach 1.
// Though the conjecture has not been proven, no counterexample has ever been found.
// This function returns how many times this recurrence needs to be applied to reach 1.
fn collatz_iterations(n_base: u32) -> u32{
    var n: u32 = n_base;
    var i: u32 = 0u;
    loop {
        if (n <= 1u) {
            break;
        }
        if (n % 2u == 0u) {
            n = n / 2u;
        }
        else {
            // Overflow? (i.e. 3*n + 1 > 0xffffffffu?)
            if (n >= 1431655765u) {   // 0x55555555u
                return 4294967295u;   // 0xffffffffu
            }

            n = 3u * n + 1u;
        }
        i = i + 1u;
    }
    return i;
}

Complete example​

Complete example