Reintroduce static lock order enforcement #5204

jimblandy · 2024-02-05T20:29:17Z

Since Arcanization, we've been running into deadlocks at a pretty regular pace. This week's update of wgpu in Firefox exposed two new deadlocks which @nical has debugged and posted fixes for.

Before Arcanization, wgpu-core had a somewhat obscure rigging of tokens and types and traits ("oh my") such that acquiring locks in the wrong order would be a compile-time error. It wasn't well-documented, and people generally found it hard to understand. Arcanization removed this rigging, to cheers and rejoicing --- and we immediately started running into deadlocks. Some were sorted out before arcanization landed, but we've continued to find them since.

We should consider adding this static scaffolding back in --- accompanied by good global documentation for wgpu-core's locking discipline, to provide an explanation we can point to when people are confused.

Note that wgpu-core's locks post-arcanization bear little resemblance to the ones we had before. Previously, the lock hierarchy corresponded to resource types: Buffer, BindGroup, Device, and so on. But now all Registrys' RwLocks are "leaf" locks - they're only held for short periods of time, without acquiring other locks (I think that's right), so they don't have interesting relationships with each other. Instead, we'll be establishing an ordering between the various fields of Device like life_tracker and temp_suspected.

We'll have to think carefully about how SnatchLock fits into the picture.

The text was updated successfully, but these errors were encountered:

jimblandy · 2024-02-05T20:38:36Z

Probably worth mentioning that the older system that I'm suggesting we re-introduce was not entirely static. You always had to start by obtaining a "root" token, representing "I hold no locks", which you'd then use to acquire your first lock. In order to prevent code from simply grabbing root tokens and locking whatever it pleased, there was a per-thread flag restricting each thread to one root token at a time - a dynamic check.

So attempts to acquire multiple root tokens would still be detected dynamically. However, the error message could be a lot more specific than it is now.

nical · 2024-02-05T20:50:19Z

Before we add back a complicated and/or invasive system, I'd like us to map out and document what the locking story is like in wgpu-core, see how many long lived locks we have and then evaluate whether we need something more than discouraging short-lived locks from accidentally becoming long lived (the two I fixed today were in that category).

teoxoy · 2024-07-04T07:48:10Z

I think we can close this now that we plan to land #5586 and we have #5572 tracking all the deadlocks.

teoxoy closed this as not planned Won't fix, can't repro, duplicate, stale Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reintroduce static lock order enforcement #5204

Reintroduce static lock order enforcement #5204

jimblandy commented Feb 5, 2024 •

edited

Loading

jimblandy commented Feb 5, 2024

nical commented Feb 5, 2024

teoxoy commented Jul 4, 2024

Reintroduce static lock order enforcement #5204

Reintroduce static lock order enforcement #5204

Comments

jimblandy commented Feb 5, 2024 • edited Loading

jimblandy commented Feb 5, 2024

nical commented Feb 5, 2024

teoxoy commented Jul 4, 2024

jimblandy commented Feb 5, 2024 •

edited

Loading