Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reintroduce static lock order enforcement #5204

Closed
jimblandy opened this issue Feb 5, 2024 · 3 comments
Closed

Reintroduce static lock order enforcement #5204

jimblandy opened this issue Feb 5, 2024 · 3 comments

Comments

@jimblandy
Copy link
Member

jimblandy commented Feb 5, 2024

Since Arcanization, we've been running into deadlocks at a pretty regular pace. This week's update of wgpu in Firefox exposed two new deadlocks which @nical has debugged and posted fixes for.

Before Arcanization, wgpu-core had a somewhat obscure rigging of tokens and types and traits ("oh my") such that acquiring locks in the wrong order would be a compile-time error. It wasn't well-documented, and people generally found it hard to understand. Arcanization removed this rigging, to cheers and rejoicing --- and we immediately started running into deadlocks. Some were sorted out before arcanization landed, but we've continued to find them since.

We should consider adding this static scaffolding back in --- accompanied by good global documentation for wgpu-core's locking discipline, to provide an explanation we can point to when people are confused.

Note that wgpu-core's locks post-arcanization bear little resemblance to the ones we had before. Previously, the lock hierarchy corresponded to resource types: Buffer, BindGroup, Device, and so on. But now all Registrys' RwLocks are "leaf" locks - they're only held for short periods of time, without acquiring other locks (I think that's right), so they don't have interesting relationships with each other. Instead, we'll be establishing an ordering between the various fields of Device like life_tracker and temp_suspected.

We'll have to think carefully about how SnatchLock fits into the picture.

@jimblandy
Copy link
Member Author

Probably worth mentioning that the older system that I'm suggesting we re-introduce was not entirely static. You always had to start by obtaining a "root" token, representing "I hold no locks", which you'd then use to acquire your first lock. In order to prevent code from simply grabbing root tokens and locking whatever it pleased, there was a per-thread flag restricting each thread to one root token at a time - a dynamic check.

So attempts to acquire multiple root tokens would still be detected dynamically. However, the error message could be a lot more specific than it is now.

@nical
Copy link
Contributor

nical commented Feb 5, 2024

Before we add back a complicated and/or invasive system, I'd like us to map out and document what the locking story is like in wgpu-core, see how many long lived locks we have and then evaluate whether we need something more than discouraging short-lived locks from accidentally becoming long lived (the two I fixed today were in that category).

@teoxoy
Copy link
Member

teoxoy commented Jul 4, 2024

I think we can close this now that we plan to land #5586 and we have #5572 tracking all the deadlocks.

@teoxoy teoxoy closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants