-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DownloadBuffer: Improve usability and documentation #6426
Comments
Huh yeah the doc for Device::poll being outdated is a straight up bug. Not sure how to be much clearer than "no-op" it can be: The call does literally nothing when targeting WebGPU, maybe it should spell it out like that with less jargon :)
It's likely possible to do this on native, but afaik + from a brief scroll over the relevant spec parts there's no way to do this on WebGPU. There is no guarantee of when the promise will be resolved (== the callback will be called). That said, "polling until the callback is executed" is fairly easy, isn't it? Just use a channel in the callback to signify that enough polling has been done. |
I mean, people do all kinds of
Hmm. I guess that's possible. But considering that there exists the
I don't really know how this is supposed to work, but this being the way it works is one of very few possible explanations for everything I've seen. This is also regarding:
So, in the docs for
So unless map_async creates some kind of hidden SubmissionIndex, or integrates otherwise into wgpu, duing a
I mean, yeahish, but on webgpu I don't have to poll at all. With that I already have a solution, something that I do on both native and web, and that is a channel, similar to what you described. In general It'd be nice to be able to somehow integrate with an rust async runtime on native btw. But thats admittedly a high ask. Also, more notes on the documentation: The documentation of |
#6360 is actually adding submission index to map_async. There's still some open questions there though, haven't looked deeply into it yet. |
Something I'd like to better understand is what a good cross platform application would look like. Let's say wgpu were to expose submission index on native and just a useless placeholder on webgpu, how would you use that in an application that has to support both? In the end you'll always end up with continuous polling until the callback fires either way. In other words, I can't entirely get my head around yet how the submission index would help 🤔. For a pure native app and/or compute workloads it might be nice-to-have I guess. |
By WebGPU spec, for any operation that in the JavaScript API returns a Promise, the C API uses a Polling until a particular async operation has been completed is then supposed to be done using When cross-compiling to the Web, Currently, Hence the need to have a submission index wherever the WebGPU JavaScript API returns a Promise, e.g. |
I see how this makes see how this makes sense for implementing wgpu-native, so is possible exposing this on wgpu-core would make things a lot easier. All somewhat related to this older issue |
The C API also relies on callbacks (clearly, there is no async runtime in C), the term "future" in So, back to the original question:
What OP wants is that I believe that the rust edit: This was somehow what @cwfitzgerald mentionned in the last message of the discussion you link! |
Sorry for taking a bit to come back to this, there was a lot of context added that I wanted to take a bit of time to read first.
To be honest, IDK. I don't have a terrible lot of experience with web+native cross platform, and in addition my use case might not reflect other users.
The reason I stumbled upon DownloadBuffer and the connected bits was for a compute workload, thus you can guess why. If I should at some point submit multiple workloads at the same time, polling with
Since there currently is a single queue per (wgpu)device, egui/eframe also runs on wgpu on my app, and I don't need the added difficulty of slicing my compute into small bits, I need to create a different device anyways currently. Otherwise the UI gets entirely unusable whenever I compute something. So there is nothing else polling that compute device.
Yeah, that would at least be my expectation if I could await this somehow. So in whatever way I await this, on native it will actually drive progress on the GPU and, and thus indeed complete when the work on that is done, while on the web its backed by a promise, since the browser drives the progress on its own. This is another difference, between native and web, but as it stands, there are already plenty differences, and they do actually cost performance on the browser. But I'll just have to live with that.
I was under some misconception of whats happening there when I wrote this. I would have expected Although it should be noted, that the current API of map_async - and by extension And, just because you expose some async APIs, doesn't mean you will be bound to any async runtime. If you don't spawn background tasks on your own or communicate with system APIs (filesystems etc) in these APIs, you won't need to do anything specific. Now why the API is problematic. You're not supposed to block in async tasks, thus calling Admittedly I can't come up with a reason the result of the callbacks for these particular APIs would have to be not Send, but IDK. If you've already got an async task for the GPU, like I do in my compute code, then ideally I'd like to have the ability to poll the GPU on that task. In my case I want to have backpressure there anyways. Issue #1871 would be a nice option to have more of a parity between native and web if the potential drawbacks are worth it to you, however as I said before, there are currently already plenty of other differences, and since they have a reason, in you having more control over what happens when, I don't see these all going away. Sorry if this is a bit all over the place. I tried to address a few things that came up, and, again, I don't feel like I'm necessarily representative for every user. I'll leave it at this, I hope I was able to bring a bit of my views across. |
I barely know about rust's async (coming from the C side of the project), but I tend to think that whatever the language async operations need some sort of runtime, either implicit or explicitly managed by having to "poll" somewhere.
Unless
The callbacks are invoked in the thread that calls
That's the point of the WGPUFuture (not to be confused with rust's Future) that latest versions of WebGPU standard is using for the native API. My whole presence in this thread is to say "I wish wgpu would comply with this part of the standard" and I think it would solve all your issues as well :) For instance: // In WebGPU standard native API, on_submitted_work_done returns a WGPUFuture
// object that we can internally implement as a submission index (native) or a Promise (web).
let done: WgpuFuture = queue.on_submitted_work_done(|| { do_something(); });
// In native, this is blocking until the submission index in `done` is processed.
// In web, this yields back to the browser until the Promise in `done` is resolved.
instance.wait_any([ done ]);
// ^^^^^^ The do_something() callback above is invoked within this call, right
// after the submission index/Promise is resolved, in the same thread.
do_something_else(); |
@eliemichel I put a bit more thought in my comments than you give me credit for. No hard feelings, just wanted to clarify that. Perhaps I formulated stuff in a way that it was hard to understand what I meant.
Oh yeah, sure. For something more complicated than running a single task, you likely want an full blown executor or even a runtime to run that program. I was referring to the fact that wgpu, as in the library, maybe doesn't need to bind to any specific runtime, and these Single functions from wgpu side can be run with fake runtimes such as pollster, or future crates block_on. Async Rust is made that you can write some parts independent of runtimes. But yeah. Thinking about it now, it's likely going to have to integrate with a waker and these are runtime specific in Rust.
Thats not gonna work. No async runtime in Rust plugs into the browser runtime in a way that works, so you're going to end up mixing 2 runtimes (browser and the Rust runtime), that's going to mess with the Rust runtime. If you mix runtimes, any yield in one runtime is gonna look exactly like blocking to the other runtime. The way I look at it, when you have blocking system APIs, that essentially just means that the OS is your runtime. But that comes with a lack of control and usually as a result performance.
As I stated, the problem is that I can't call device.poll on the thread the async task runs on. So different thread.
Yeah, sure, I suspected that from the beginning, and when I read the threads that got linked to, that suspicion was confirmed.
I agree that the potential is there. Not in the way you demonstrated, because of the issues mentioned above. But if there were a way to integrate this future like object with an actual rust future, without requiring explicit bindings to any specific runtime, then my wish would be granted. If that's not possible, in the end that's fine for me, I'll just keep the current way I do things, I don't think that's gonna cost me any significant performance in my specific use case. I just wanted to point out the issues that the current design could have if that API is used with high frequency and/or low latency requirements. And that for all programmers it's going to be a bit awkward to deal with at the very least, having it go though the boilerplate of spawning a thread/calling block_on and then putting the data through a channel etc. |
Ow, sorry for this feeling, I replied along the way as I was reading without a proper introduction to my message, my bad and apologies!
I may have incorrectly interpreted @Wumpf 's answer above, but I was under the impression that they want to avoid having wgpu user even need to decide on a runtime at all, by having the API not expose any async thing. But again, not used to the rust API and its various future runtimes (aren't they built on top of the browser's runtime when cross-compiling to WebAssembly?). My mental model is based on how I see WebGPU implemented in C/emscripten (e.g., an explicit call |
For what it's worth, Rust WebAssembly projects generally don't yield like this, because it usually means doing something like pausing the entire WebAssembly module while waiting for a promise to resolve (e.g., emscripten's asyncify transform or JSPI https://emscripten.org/docs/porting/asyncify.html). Even in Emscripten people should generally prefer to avoid doing that if possible because of how it blocks otherwise (e.g., new projects that can await promises the usual way). |
Nah, honestly I should say sorry. I misinterpreted your lack of knowledge about rusts async ecosystem as you explaining things to me thinking I didn't understand them. Sorry.
Hmm, I'm not sure, you might be right though. However, offering an async API doesn't mean that you can't offer synchronous (e.g. the current callback + poll on native, or maybe it blocks and returns a RAII guard for access instead) alternatives. And there are simple (low-dependency etc) solutions to executing a simple async task without requiring a full executor. The previously mentioned pollster, on the web there's wasm_bindgen_futures. My interpretation was that wgpu would be bound to a certain runtime, this is something of an issue with Rust async. Rusts async abstracts over awaiting, rebuilding that into a state machine, but other than generating state machines, it doesn't offer any other abstractions, like spawning other tasks, combining different Futures in different ways (first one wins, merge, etc). This means that many libraries, when they want to offer any async functions, need to decide for one of the runtimes. Generally tokio is supported, but tokio doesn't work for all usecases, that was the reason async Rust doesn't bring its own standard runtime, so thats... annoying.
I should state that there's quite a few crates that promise to plug into browser APIs. But most of them are abandoned and were web specific. There are just subtle difficulties mapping things to work equivalently on the browser and natively. Like that you can't block the main thread in the browser, that timers tend to work differently than system timers (and may work differently depending on context, with different precisions), etc. The prominent runtimes just end up with very limited support on WASM. tokio doesn't support sleeping, and doesn't support the actual runtimes. You can use simpler APIs, like join and merge, async channels and other synchronization primitives (though using them improperly on the browser main thread will panic). Even with runtimes that 'use the browser runtime as their runtime' they probably still would need to be explicitly called, just binding to a browser promise directly will likely still break the runtime. And even so. Assuming that there were a runtime that somehow managed to work well with code using the browser API directly. If WGPU would then use this approach, this would mean that it only works with that specific runtime. |
Thanks for the details 🙏 I still need to read more about rust async, but meanwhile this makes me keep on thinking that my (possibly wrong) interpretation of @Wumpf 's answer is the safest way to go, namely not exposing any async and letting third party connect the dots for their preferred async runtime (if that is possible to do?). Is it possible, when a Rust codebase has explicit continuation state management and no async at all, to somehow "hand" the state management to an async runtime without requiring changes to the APIs that are used? If so, that would be a good sign for keeping the wgpu API async-less. The way I tend to use the WebGPU API in C is to place the whole application's logic in a (Okey, now I want to try giving a shot at running new C++ coroutines in a Web-ready way) |
I seem to have forgotten about that wgpu already has APIs that return (rust) futures, that are hard to avoid ( So honestly, unless they get changed to work in different ways, if it were (!) possible to have
The core of my issue wasn't that its a bit more effort to the programmer to do things, but that doing things in a performant way - at least without unsafe or potentially messing up the runtime - isn't possible, as far as I can see.
Unless they interface with the lower level Future APIs, no, AFAIK. |
Is your feature request related to a problem? Please describe.
I recently tried to use DownloadBuffer, but stumbled upon some issues with the callback not actually being executed.
After looking at the source-code, specifically the usage of
DownloadBuffer::map_async
andcopy_buffer_to_buffer
and the documentation ofmap_async
, things clarified for me:However I still don't have a solution that's entirely satisfying to me. And regardless of getting that, the documentation of
DownloadBuffer
and/orDownloadBuffer::read_buffer
really should mention or reference these considerations too. Its most likely to be used by beginners, and these are more likely to stumble upon issues caused by these issues.Now regarding a satisfying solution. I think ideally we should be able to 'poll until callback was executed (and no further)'. But that's not possible right now. We have no way of getting the SubmissionIndex returned by the call to
copy_buffer_to_buffer
, and I'm not even sure if doingdevice_arc.poll(wgpu::Maintain::WaitForSubmissionIndex(subm_idx))
after the call toDownloadBuffer::read_buffer
would be sufficient (?).Describe the solution you'd like
At the very least the documentation of
DownloadBuffer::read_buffer
should contain some hint at the issue.Ideally we would also be provided a way of polling until the callback is executed, e.g. by returning the SubmissionIndex created in the call to
copy_buffer_to_buffer
fromread_buffer
, if that is sufficient. If not some other way would be appreciated.Describe alternatives you've considered
I can not create my own variant of
DownloadBuffer::read_buffer
without patching wgpu, or making considerable changes to its logic, as that method uses a lot of private fields, methods etc. I'm perfectly willing to patch wgpu for experimentation, but that's not a permanent solution.Meanwhile using
device_arc.poll(wgpu::Maintain::Wait)
, from my understanding, might wait a lot longer than required, if another thread happened to submit a long job in the meantime.Additional context
Note that this behavior can only be reproduced as is on native, on the web the browser polls webgpu in its event loop.
Also, I noticed that the documentation of
Device::poll
is outdated. It doesnt seem problematic to me, but thought I'd let you know. The documentation states that this method returns a boolean, when it does in fact return an enum. Also the documentation does not mentionMaintain::WaitForSubmissionIndex
at all. And the operation on the web could be clarified: I assume that the text means that polling will never block, however if the sentence uses terms loosely - concretely what noop means - , it could also be interpreted as blocking until its done but - in contrast to native - not actively driving the completion.The text was updated successfully, but these errors were encountered: