-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Implement arena allocation for VersionVec #12
base: master
Are you sure you want to change the base?
Conversation
Unsure how this worked before, but the macro used to declare the scheduler's state never existed in scoped-tls afaict. Fix involves doing disgusting things to RefCell (so rather hacky).
To allow for cloning objects allocated in the arena, some rearrangements had to be made, namely moving allocation functions to a place where they can be used with a shared reference to the arena. This sounds dangerous, as no synchronization is done. However, it should be safe: with every scheduler, no true concurrency is present in the loom runtime, and the only scheduler using multiple threads (`std`) places the global execution state behind a mutex. Still, this is somewhat of an opinionated approach to integrating the arena allocator into loom.
As a first step, allocating clock vectors in the arena seems reasonable and allows to gauge how much complexity such a change introduces. It turns out that at the moment, handling of the arena requires that it is passed around to code that should be able to allocate in it. However, a globally accessible arena could prove to be more elegant. Also, not all `VersionVec` instances are deallocated after the end of each iteration, so they are currently handled by allowing `VersionVec`s to be allocated on the heap as before as well.
Thanks for submitting the PR. Could you post before / after benchmark results? Also, what version of |
Argh, sorry -- I forgot to reference all the relevant information as it's already included in the proposal. Luckily, the relevant bits are available here. To summarize: Commit LOOM_MAX_DURATION=10 hyperfine \
--export-markdown bench.md ./fuzz_semaphore_arena ./fuzz_semaphore_system
I realize I also forgot to test the serialization feature and accidentally committed some nightly-only stuff though, I'll fix that shortly. |
It looks like the measurement is the total time of execution, but you are also capping it with The fact that an arena is slower is unexpected. I would dig in and see what the time is spent on w/ a profile. |
Yeah, that is odd. However, since multiple different tests are run in the binary, the rough ballpark of the runtimes makes sense. I don't yet have an explanation on the slowdown, but I'm working on that. |
For now, we only deserialize to heap-allocated `VersionVec`s. Also, we now have an iterator for arena slices.
Aaand I forgot windows has no |
Excuse the force-push -- but windows builds should work now! |
* Less inefficient cloning of `Slice`s * Much more efficient dropping of `Slice`s
Alright, most of the performance regression is now gone after optimizing the |
Just a quick update: I'm currently experimenting with ways to provide arena allocation for other uses of |
Quick update from me. I’ve been on vacation. I’ll be digging in next week. Expect to hear more then. |
I'm exploring the possibility to store the arena in a scoped thread local variable to buy us some flexibility wrt where exactly the arena can be used. I'll benchmark a simple arena-backed vector implementation based on that. |
Another update: Pushing more and more data structures into the arena improves performance. As of now, I am still debugging the implementation though. Also, I haven't found a nice way to scale arena usage to more data structures other than to store the arena "control block" in a thread local variable. This has the obvious drawback that the |
@ibabushkin Putting it in a thread-local is fine. If we don't have |
I'm submitting this PR alongside my GSoC proposal (see
tokio-rs/gsoc#2), providing two core changes:
fringe
scheduler on nightly is possible again.VersionVec
objects can now be allocated in an arena cleared after each iteration ofthe model checker. The implementation is based on the stub already present.
However, some caveats remain: It is debatable whether integrating the arena into the
Execution
struct is the right way forward, since passing around a mutable reference tothe arena to allow allocation in various modules is certainly not ergonomic.
So far, performance on the workloads I've tested has been worse than expected, and plenty
other questions need to be discussed:
or some other advantage?
expected speedup?
Additionally, I was able to discover a resource leak in one of the
tokio-sync
testsusing this patch set: leaking objects mocked by
loom
can cause the arena to refuseclearing, if
Slice
s allocated on behalf of the tested program are still live, so ifthe tested program contains such a leak, the test crashes when
loom
tries to clear thearena after the leaky iteration.
Here's the relevant function from the
test:
I'm unsure whether this is worth opening a tokio issue for, as the bug is largely
irrelevant in practise, and only amounts to leaking (tiny) amounts of memory in a test.