-
Notifications
You must be signed in to change notification settings - Fork 920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wgpu22 23% slower than wgpu 0.20 on my render-bench. #6434
Comments
Pictures of traces: Note lack of red bars at top. All frames have consistent time. For this part of the run, render-bench is just refreshing the screen, with no changes. Note red bars at top. Those are slow frames. They're intermittent. This is a source of jank in a simple situation that should be jank-free. |
Zoom in on the jank trouble spot. Most of this seems to be outside WGPU. Yet all that was changed was to convert Rend3 to use WGPU22, with the required version changes for EGUI and WINIT. This benchmark doesn't use EGUI, and WINIT isn't doing anything in the middle of a frame. On a few frames I saw what seemed to be in intermittent lock stall in VkQueuePresentKHR. That didn't happen with WGPU 0.20. But it's not where much of the time is going. Did something get bigger that's forcing cache misses? |
(All this was tested on desktop, not WGPU.) |
Regarding locking, there were some changes after 0.20.* in which we discovered and fixed some UB with Vulkan synchronization. They're in the |
Where is that? Looking in https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md and not finding any relevant changes later than 0.19, which is before this regression test. |
Ah, shoot, I made an off-by-one error; the PR I was thinking of is @cwfitzgerald's #5681, but that should be released as of 0.20. 🫤 I don't think I have any relevant information any more, sorry! |
OK, thanks. For some slow frames, there's about 0.5ms of stall at the beginning of the frame, in vkQueuePresentHR. Usually, that's so fast it barely shows up in profiling. But sometimes it takes longer. No idea why. The program at this point is rendering the same image over and over. That's a source of jank. That's not where most of the excess time is going, though. Some time goes into frustum culling and depth-sorting objects. Sorting does not seem to be slower, though, so although that may be a performance issue,it's not this problem. There's a call to .get_mapped_range_mut(). Is it possible to get a stall there if something is happening GPU side? I need to add more profiling calls in Rend3 to pin this down. |
Much of the additional time is going into set_bind_group. Did that get slower? |
From #6419:
which my render-bench is doing. So that's likely to be at least part of the problem. Thanks. How soon can I test against something with #6419: and find out if this really is the problem? While render-bench does do that, the actual application has many more different textures. So there might be something else as well. |
There will probably be a release this week. |
Oh, good. Please let me know when there's a branch I can test. Thanks. |
Actually #6419 is already on trunk if you want to test it early. |
Not sure about the version situation.
Somewhat confused here. Is 22.1.0 being backed out? If a release to crates.io is expected this week, I'm tempted to wait. What will the next version number be? Will there be breaking changes that break egui or winit? |
The problem is that the version on trunk never got bumped to 22.1.0 (or beyond), that only happened on a separate branch for the 22.1.0 minor release |
Branch wgp22: This branch is 10 commits ahead of, 449 commits behind trunk. Is some merging needed? Could the version on trunk be bumped up to a number greater than 22.1.0? If there are no breaking changes, 22.1.1 would be good. if there are breaking changes that break egui again, that's a problem, because it takes months for that side to catch up. |
I think there might be room for adding a merge of non-major releases' |
It's now next week. Waiting. |
@John-Nagle: see #6465. |
@John-Nagle Your tone here is impatient, which suggests you have a misunderstanding. wgpu is not a product you have paid for, and which would thus have an obligation to address your concerns. Rather, it is a volunteer project which freely provides its output for you to use and contribute to. If you are able to analyze the problem and point out what aspect of wgpu is causing it, that would be a valuable contribution to the project. Asking others to analyze your code's performance problems is not a contribution. The Mozilla contributors to wgpu, at least, are prioritizing security and compatibility issues for the moment, so we are unlikely to look at this any time soon. |
I've provided a benchmark. Render-bench is open source and exists to benchmark WGPU in an easily repeatable way. The jank and slowdown are even worse in my real application, Sharpview. That's hard to run reproduceably, though. Which is why I provide a standalone benchmark. Tracy profiling indicates that bind creation is slower. However, adding profiling of every bind call seriously impacts performance, so it's hard to get below that level. This slowdown should have been caught in regression testing of WGPU. How did it slip through testing? You're welcome to add render-bench to your test suite if you wish. I realize that I have to go bindless eventually, but I was not expecting WGPU performance to suddenly degrade this badly. I can probably stay with WGPU 0.20 for a while. WGPU 22 is useless to me. |
23 was released, is the issue still present there? |
Ah, 23.0.0 just landed on crates.io. Nice. I will test. |
Trying to build with wgpu 23.0.0. There's a version problem. wgpu-profiler is now out of sync. The latest version on crates.io, |
@John-Nagle: |
I've got a PR up that fixes the build but it points out that the demo is broken and I didn't have time yet to fix that: |
After forking and starting a PR request to fix wgpu-profiler, I discovered that @Wumpf was also fixing it. The fix is on Github but not, as of last night, crates.io. Working through the rest of the breaking changes in WGPU 23, so I don't know about performance yet. |
OK, got everything up to WGPU 23. Minor progress. WGPU 23 is only 21% slower than WGPU 0.20, instead of 23% slower as WGPU 22 was. 33 FPS instead of 37 FPS with WGPU 0.20. More jank than WGPU 0.20, but less than WGPU 22. Much of the time still goes into binding. The jank problem seems to be that, once in a while, a frame takes an extra 5ms. This benchmark program is mostly drawing the same thing on each frame, so that shouldn't be happening. To reproduce this, use the same procedure as at the top of this issue, but with branch "wgpu23safe". Please try that and see if you see the jank problem, too. Times near "Adding buildings" and "Deleting buildings" are always longer (that's a different problem) but all the frames not near those actions should be almost identical. I'd appreciate someone else reproducing this on a different system. I'm using Ubuntu 22.04 LTS with an NVidia 3070, and a 6-CPU machine. Need comparisons. Thanks. WGPU 23 frame rate info:
Compare data from WGPU 0.20:
Notice the much larger max values and standard deviations for WGPU23. CPU usage suddenly increased for a few frames. Almost like a garbage collection was in progress. But this is Rust. |
Description
Wgpu22 seems to be 23% slower than wgpu 0.20 on my render-bench.
Repro steps
Build and run https://github.com/John-Nagle/render-bench/ branches "hp" vs "wgpu22safe". The second one uses WGPU 22, and all the more recent crates it requires. Build with
cargo build --release --features tracy
Start up Tracy Profiler 0.11.1 and click "Connect". It will then wait for the traced program to start.
./target/release/render-bench
Run for about a minute, with the graphics window on top and full screen, then close the graphics window.
When you run the program, it prints frame time stats. Every 10 seconds, it adds or deletes a large number of objects. Look at average time after adding objects.
Expected vs observed behavior
Frames average about 23% slower with WGPU 22.
There is almost no variation in frame time with WGPU 0.20, but there are slow frames a few times per second with WGPU 22. So there's a jank problem. All frames are slower, as well.
At least part of the problem is an intermittent lock stall in VkQueuePresentKHR. This is not the dominant user of time but seems to be responsible for the jank problem.
Note that this is a benchmark test, so that the developers can see this problem in a clean situation. My real program, the Sharpview metaverse viewer, has these problems, and they're much worse there. Occasional frames as slow as 200ms. So bad that WGPU 22 is unusable. Sharpview has a lot more going on in other threads than this benchmark, which only has two threads, one of which is idle except when it changes the scene every 10 seconds.
That's in wgpu/wgpu-hal/src/vulkan/mod.rs near line 1259. Nothing jumps out at me in "Blame"
Extra materials
Tracing results follow in updates.
Platform
Linux 22.04 LTS. NVidia 3070.
The text was updated successfully, but these errors were encountered: