Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wgpu22 23% slower than wgpu 0.20 on my render-bench. #6434

Open
John-Nagle opened this issue Oct 21, 2024 · 28 comments
Open

Wgpu22 23% slower than wgpu 0.20 on my render-bench. #6434

John-Nagle opened this issue Oct 21, 2024 · 28 comments

Comments

@John-Nagle
Copy link

Description
Wgpu22 seems to be 23% slower than wgpu 0.20 on my render-bench.

Repro steps
Build and run https://github.com/John-Nagle/render-bench/ branches "hp" vs "wgpu22safe". The second one uses WGPU 22, and all the more recent crates it requires. Build with

cargo build --release --features tracy
Start up Tracy Profiler 0.11.1 and click "Connect". It will then wait for the traced program to start.
./target/release/render-bench

Run for about a minute, with the graphics window on top and full screen, then close the graphics window.

When you run the program, it prints frame time stats. Every 10 seconds, it adds or deletes a large number of objects. Look at average time after adding objects.

Expected vs observed behavior
Frames average about 23% slower with WGPU 22.

There is almost no variation in frame time with WGPU 0.20, but there are slow frames a few times per second with WGPU 22. So there's a jank problem. All frames are slower, as well.

At least part of the problem is an intermittent lock stall in VkQueuePresentKHR. This is not the dominant user of time but seems to be responsible for the jank problem.

Note that this is a benchmark test, so that the developers can see this problem in a clean situation. My real program, the Sharpview metaverse viewer, has these problems, and they're much worse there. Occasional frames as slow as 200ms. So bad that WGPU 22 is unusable. Sharpview has a lot more going on in other threads than this benchmark, which only has two threads, one of which is idle except when it changes the scene every 10 seconds.

That's in wgpu/wgpu-hal/src/vulkan/mod.rs near line 1259. Nothing jumps out at me in "Blame"

Extra materials
Tracing results follow in updates.

Platform
Linux 22.04 LTS. NVidia 3070.

@John-Nagle
Copy link
Author

Pictures of traces:
WGPU 0.20, one frame.

Note lack of red bars at top. All frames have consistent time. For this part of the run, render-bench is just refreshing the screen, with no changes.

WGPU 22 trace.

Note red bars at top. Those are slow frames. They're intermittent. This is a source of jank in a simple situation that should be jank-free.

@John-Nagle
Copy link
Author

John-Nagle commented Oct 21, 2024

Zoom in on the jank trouble spot.

tracewgpu22safe1zoom

Most of this seems to be outside WGPU. Yet all that was changed was to convert Rend3 to use WGPU22, with the required version changes for EGUI and WINIT. This benchmark doesn't use EGUI, and WINIT isn't doing anything in the middle of a frame.

On a few frames I saw what seemed to be in intermittent lock stall in VkQueuePresentKHR. That didn't happen with WGPU 0.20. But it's not where much of the time is going.

Did something get bigger that's forcing cache misses?

@John-Nagle
Copy link
Author

(All this was tested on desktop, not WGPU.)

@ErichDonGubler
Copy link
Member

Regarding locking, there were some changes after 0.20.* in which we discovered and fixed some UB with Vulkan synchronization. They're in the CHANGELOG, if you'd like to find them (I'm on mobile, so linking takes more time than I'd like 😅).

@John-Nagle
Copy link
Author

Where is that? Looking in https://github.com/gfx-rs/wgpu/blob/trunk/CHANGELOG.md and not finding any relevant changes later than 0.19, which is before this regression test.

@ErichDonGubler
Copy link
Member

Ah, shoot, I made an off-by-one error; the PR I was thinking of is @cwfitzgerald's #5681, but that should be released as of 0.20.

🫤 I don't think I have any relevant information any more, sorry!

@John-Nagle
Copy link
Author

OK, thanks.

For some slow frames, there's about 0.5ms of stall at the beginning of the frame, in vkQueuePresentHR. Usually, that's so fast it barely shows up in profiling. But sometimes it takes longer. No idea why. The program at this point is rendering the same image over and over. That's a source of jank.

That's not where most of the excess time is going, though. Some time goes into frustum culling and depth-sorting objects. Sorting does not seem to be slower, though, so although that may be a performance issue,it's not this problem.

There's a call to .get_mapped_range_mut(). Is it possible to get a stall there if something is happening GPU side? I need to add more profiling calls in Rend3 to pin this down.

@John-Nagle
Copy link
Author

Much of the additional time is going into set_bind_group. Did that get slower?

@teoxoy
Copy link
Member

teoxoy commented Oct 22, 2024

It probably did due to #5874 but with #6419 (not yet part of any release) it should no longer be an issue.

@John-Nagle
Copy link
Author

John-Nagle commented Oct 22, 2024

From #6419:

The problem here as far as I can tell is that we are creating 160k unique bind groups that all share the same texture in a loop.

which my render-bench is doing. So that's likely to be at least part of the problem. Thanks.

How soon can I test against something with #6419: and find out if this really is the problem? While render-bench does do that, the actual application has many more different textures. So there might be something else as well.

@teoxoy
Copy link
Member

teoxoy commented Oct 23, 2024

There will probably be a release this week.

@John-Nagle
Copy link
Author

Oh, good. Please let me know when there's a branch I can test. Thanks.

@teoxoy
Copy link
Member

teoxoy commented Oct 23, 2024

Actually #6419 is already on trunk if you want to test it early.

@John-Nagle
Copy link
Author

Not sure about the version situation.

  • I'm currently using released wgpu 22.1.0, from crates.io.
  • Trunk on github is versioned as the older 22.0.0.
  • Branch "wgpu22" on Github is versioned as 22.1.0 and is both ahead of and behind trunk.
  • Changelog has nothing for wgpu 22.1.0, just "unreleased" since 22.0.0.

Somewhat confused here. Is 22.1.0 being backed out?

If a release to crates.io is expected this week, I'm tempted to wait. What will the next version number be? Will there be breaking changes that break egui or winit?

@Wumpf
Copy link
Member

Wumpf commented Oct 23, 2024

The problem is that the version on trunk never got bumped to 22.1.0 (or beyond), that only happened on a separate branch for the 22.1.0 minor release

@John-Nagle
Copy link
Author

Branch wgp22: This branch is 10 commits ahead of, 449 commits behind trunk.

Is some merging needed?

Could the version on trunk be bumped up to a number greater than 22.1.0? If there are no breaking changes, 22.1.1 would be good. if there are breaking changes that break egui again, that's a problem, because it takes months for that side to catch up.

@ErichDonGubler
Copy link
Member

I think there might be room for adding a merge of non-major releases' CHANGELOG entries and versions back into trunk, but I think that we as a maintainership will need to discuss it first. Will discuss in our next maintainers' meeting.

@John-Nagle
Copy link
Author

There will probably be a release this week.

It's now next week. Waiting.

@ErichDonGubler
Copy link
Member

@John-Nagle: see #6465.

@jimblandy
Copy link
Member

jimblandy commented Oct 29, 2024

It's now next week. Waiting.

@John-Nagle Your tone here is impatient, which suggests you have a misunderstanding.

wgpu is not a product you have paid for, and which would thus have an obligation to address your concerns. Rather, it is a volunteer project which freely provides its output for you to use and contribute to.

If you are able to analyze the problem and point out what aspect of wgpu is causing it, that would be a valuable contribution to the project. Asking others to analyze your code's performance problems is not a contribution.

The Mozilla contributors to wgpu, at least, are prioritizing security and compatibility issues for the moment, so we are unlikely to look at this any time soon.

@John-Nagle
Copy link
Author

I've provided a benchmark. Render-bench is open source and exists to benchmark WGPU in an easily repeatable way. The jank and slowdown are even worse in my real application, Sharpview. That's hard to run reproduceably, though. Which is why I provide a standalone benchmark.

Tracy profiling indicates that bind creation is slower. However, adding profiling of every bind call seriously impacts performance, so it's hard to get below that level.

This slowdown should have been caught in regression testing of WGPU. How did it slip through testing? You're welcome to add render-bench to your test suite if you wish.

I realize that I have to go bindless eventually, but I was not expecting WGPU performance to suddenly degrade this badly.

I can probably stay with WGPU 0.20 for a while. WGPU 22 is useless to me.

@sagudev
Copy link
Contributor

sagudev commented Oct 31, 2024

23 was released, is the issue still present there?

@John-Nagle
Copy link
Author

Ah, 23.0.0 just landed on crates.io. Nice. I will test.

@John-Nagle
Copy link
Author

Trying to build with wgpu 23.0.0.

There's a version problem. wgpu-profiler is now out of sync. The latest version on crates.io,
wgpu profiler v0.18.2, depends on wgpu ^22.1.0. So programs building this stack pull in two versions of wgpu and get compile errors.

@ErichDonGubler
Copy link
Member

@John-Nagle: wgpu-profiler is a third-party crate maintained by @Wumpf. I strongly suspect he would welcome a contribution from you to upgrade wgpu-profiler to v23.

@waywardmonkeys
Copy link
Contributor

I've got a PR up that fixes the build but it points out that the demo is broken and I didn't have time yet to fix that:

Wumpf/wgpu-profiler#88

@John-Nagle
Copy link
Author

@John-Nagle: wgpu-profiler is a third-party crate maintained by @Wumpf. I strongly suspect he would welcome a contribution from you to upgrade wgpu-profiler to v23.

After forking and starting a PR request to fix wgpu-profiler, I discovered that @Wumpf was also fixing it. The fix is on Github but not, as of last night, crates.io. Working through the rest of the breaking changes in WGPU 23, so I don't know about performance yet.

@John-Nagle
Copy link
Author

OK, got everything up to WGPU 23. Minor progress. WGPU 23 is only 21% slower than WGPU 0.20, instead of 23% slower as WGPU 22 was. 33 FPS instead of 37 FPS with WGPU 0.20. More jank than WGPU 0.20, but less than WGPU 22.

tracewgpu23safe1zoom
Tracy profiling.

Much of the time still goes into binding.

The jank problem seems to be that, once in a while, a frame takes an extra 5ms. This benchmark program is mostly drawing the same thing on each frame, so that shouldn't be happening.

To reproduce this, use the same procedure as at the top of this issue, but with branch "wgpu23safe". Please try that and see if you see the jank problem, too. Times near "Adding buildings" and "Deleting buildings" are always longer (that's a different problem) but all the frames not near those actions should be almost identical.

I'd appreciate someone else reproducing this on a different system. I'm using Ubuntu 22.04 LTS with an NVidia 3070, and a 6-CPU machine. Need comparisons. Thanks.

WGPU 23 frame rate info:

00060 frames over 01.00s. Min: 16.20ms; Average: 16.68ms; 95%: 17.14ms; 99%: 17.16ms; Max: 17.16ms; StdDev: 00.21ms
00060 frames over 01.00s. Min: 16.20ms; Average: 16.68ms; 95%: 17.16ms; 99%: 17.20ms; Max: 17.20ms; StdDev: 00.22ms
00061 frames over 01.02s. Min: 16.20ms; Average: 16.68ms; 95%: 17.12ms; 99%: 17.17ms; Max: 17.17ms; StdDev: 00.20ms
Adding buildings.
Adding buildings completed. 58175 meshes added.
00046 frames over 01.03s. Min: 16.12ms; Average: 22.33ms; 95%: 52.23ms; 99%: 52.26ms; Max: 52.26ms; StdDev: 11.76ms
00032 frames over 01.01s. Min: 30.07ms; Average: 31.48ms; 95%: 38.76ms; 99%: 38.76ms; Max: 38.76ms; StdDev: 01.55ms
00032 frames over 01.00s. Min: 29.79ms; Average: 31.33ms; 95%: 33.75ms; 99%: 33.75ms; Max: 33.75ms; StdDev: 00.80ms
00033 frames over 01.03s. Min: 30.18ms; Average: 31.16ms; 95%: 32.90ms; 99%: 32.90ms; Max: 32.90ms; StdDev: 00.70ms
00033 frames over 01.02s. Min: 29.79ms; Average: 31.02ms; 95%: 34.77ms; 99%: 34.77ms; Max: 34.77ms; StdDev: 00.93ms
00032 frames over 01.00s. Min: 30.11ms; Average: 31.39ms; 95%: 32.87ms; 99%: 32.87ms; Max: 32.87ms; StdDev: 00.65ms
00033 frames over 01.02s. Min: 29.75ms; Average: 30.91ms; 95%: 32.72ms; 99%: 32.72ms; Max: 32.72ms; StdDev: 00.65ms
00033 frames over 01.02s. Min: 30.00ms; Average: 31.00ms; 95%: 32.29ms; 99%: 32.29ms; Max: 32.29ms; StdDev: 00.55ms
00032 frames over 01.01s. Min: 30.13ms; Average: 31.67ms; 95%: 37.29ms; 99%: 37.29ms; Max: 37.29ms; StdDev: 01.36ms
00033 frames over 01.03s. Min: 30.11ms; Average: 31.08ms; 95%: 36.50ms; 99%: 36.50ms; Max: 36.50ms; StdDev: 01.11ms
Deleting buildings.
Deleting buildings completed. 58175 meshes deleted.

Compare data from WGPU 0.20:

00060 frames over 01.00s. Min: 16.07ms; Average: 16.68ms; 95%: 17.12ms; 99%: 17.24ms; Max: 17.24ms; StdDev: 00.27ms
Adding buildings.
00056 frames over 01.02s. Min: 16.24ms; Average: 18.16ms; 95%: 34.54ms; 99%: 35.39ms; Max: 35.39ms; StdDev: 04.36ms
Adding buildings completed. 58175 meshes added.
00034 frames over 01.03s. Min: 26.59ms; Average: 30.18ms; 95%: 45.45ms; 99%: 45.45ms; Max: 45.45ms; StdDev: 05.19ms
00037 frames over 01.01s. Min: 26.53ms; Average: 27.23ms; 95%: 28.74ms; 99%: 28.74ms; Max: 28.74ms; StdDev: 00.43ms
00037 frames over 01.01s. Min: 26.61ms; Average: 27.37ms; 95%: 29.02ms; 99%: 29.02ms; Max: 29.02ms; StdDev: 00.48ms
00037 frames over 01.02s. Min: 26.85ms; Average: 27.54ms; 95%: 28.75ms; 99%: 28.75ms; Max: 28.75ms; StdDev: 00.38ms
00037 frames over 01.02s. Min: 26.89ms; Average: 27.48ms; 95%: 28.93ms; 99%: 28.93ms; Max: 28.93ms; StdDev: 00.38ms
00037 frames over 01.02s. Min: 26.94ms; Average: 27.56ms; 95%: 29.20ms; 99%: 29.20ms; Max: 29.20ms; StdDev: 00.43ms
00037 frames over 01.02s. Min: 26.71ms; Average: 27.59ms; 95%: 28.82ms; 99%: 28.82ms; Max: 28.82ms; StdDev: 00.44ms
00037 frames over 01.02s. Min: 26.82ms; Average: 27.50ms; 95%: 28.72ms; 99%: 28.72ms; Max: 28.72ms; StdDev: 00.35ms
00037 frames over 01.02s. Min: 26.90ms; Average: 27.62ms; 95%: 29.44ms; 99%: 29.44ms; Max: 29.44ms; StdDev: 00.53ms
Deleting buildings.
Deleting buildings completed. 58175 meshes deleted.
00037 frames over 01.02s. Min: 26.71ms; Average: 27.48ms; 95%: 28.90ms; 99%: 28.90ms; Max: 28.90ms; StdDev: 00.46ms
00062 frames over 01.01s. Min: 12.68ms; Average: 16.27ms; 95%: 17.02ms; 99%: 40.86ms; Max: 40.86ms; StdDev: 04.10ms

Notice the much larger max values and standard deviations for WGPU23. CPU usage suddenly increased for a few frames. Almost like a garbage collection was in progress. But this is Rust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

7 participants