Potential sync bug when two compute pipelines access same buffer that is written to via `queue.write_buffer`? #3918

LukasKalbertodt · 2023-07-09T21:36:40Z

LukasKalbertodt
Jul 9, 2023

I am not sure if this should be an issue instead, but I am very unsure about this all, so I'm just throwing it out here.

I built a GPU driven renderer that consists of a compute pass, which culls whole clusters, and a render pass that renders the surviving clusters via multi draw indirect. Roughly much what is described here. Three buffers are important:

cluster_buffer, containing a list of clusters. This is written by the CPU and read by the compute pass.
count_buffer which is only 4 bytes large. Cleared by the CPU and used in the compute pass by atomic_add()ing to it, using the returned number as an index into the indirect buffer.
indirect_buffer: contains a list of draw calls. Written by the compute pass.

The actual draw call is multi_draw_indexed_indirect_count(indirect_buffer, 0, count_buffer, 0, _). Before each compute pass, the count buffer is cleared. The indirect buffer is not touched, as it's just being overwritten from the start again.

Now to the actual problem: for whatever reason, I cleared the count buffer via queue.write_buffer(&count_buffer, 0, &[0u8; 4]) and not via encoder.clear_buffer(&count_buffer, 0, None);. The write_buffer approach worked for a while, until I performed multiple (cull+draw) operations per frame (for shadow map generation). Suddenly, I got flickering geometry and lots of other weird effects. Nothing made any sense anymore. After quite a bit of debugging and trying things out, I switched that one line to encoder.clear_buffer and that solved all problems. (Vulkan, Linux, Nvidia)

I'm glad it now works, but I'm not sure if this is something wgpu could improve? Or is this documented undefined behavior and I was simply a dumbo? Could anyone point me to additional resources on this?
On the other hand, if this sounds like a bug, I can provide more information or try to provide a minimal example.

(Thanks a ton for wgpu!!)

kpreid · 2023-07-10T01:09:17Z

kpreid
Jul 10, 2023

If you are submitting multiple {compute; draw;} operations using a single command buffer, then this would be expected behavior. write_buffer() goes on the queue directly, so if you did

let encoder = create_command_encoder();
write_buffer();
encoder.begin_compute_pass();
encoder.begin_render_pass();
write_buffer();
encoder.begin_compute_pass();
encoder.begin_render_pass();

then the second compute pass would not see the cleared buffer you wanted; the actual order of operations will be {write; write; compute; render; compute; render;} Using encoder.clear_buffer() means that the clear operation is a command in the same command buffer, so it is ordered together with the rest of the buffer.

2 replies

LukasKalbertodt Jul 10, 2023
Author

Oh that makes sense! I suppose this part of the write_buffer docs:

As such, the write is not immediately submitted, and instead enqueued internally to happen at the start of the next submit() call.

... explains what you just said, right? I did read that multiple times but somehow it didn't really click for me until now.

Then I will close this discussion, thanks a lot!

kpreid Jul 10, 2023

I think it's better to look at it from the other end: CommandEncoder and the CommandBuffer it produces have no ordering relationship with the Queue until you submit the buffer to the queue. write_buffer() participates in ordering in the queue, not ordering in the command buffer.

In this view, the note on write_buffer() just means that its execution is deferred until some other submit happens — a fact about performance, not a fact about ordering.

(Disclaimer: I am not confident I completely understand wgpu/WebGPU semantics. This description is just from the documentation and having used it — not deep knowledge.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential sync bug when two compute pipelines access same buffer that is written to via `queue.write_buffer`? #3918

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Potential sync bug when two compute pipelines access same buffer that is written to via queue.write_buffer? #3918

LukasKalbertodt Jul 9, 2023

Replies: 1 comment · 2 replies

kpreid Jul 10, 2023

LukasKalbertodt Jul 10, 2023 Author

kpreid Jul 10, 2023

Potential sync bug when two compute pipelines access same buffer that is written to via `queue.write_buffer`? #3918

LukasKalbertodt
Jul 9, 2023

Replies: 1 comment 2 replies

kpreid
Jul 10, 2023

LukasKalbertodt Jul 10, 2023
Author