-
Notifications
You must be signed in to change notification settings - Fork 14
ADRs
Find out the most performant branch with data parallelism implementation to merge back with master.
Following are the branches exists in Epirust source code with the logic of data parallelism in different ways:
- parallel
- map_reduce
- read_only_view Details can be here in the doc. We benchmark the all branches against master with different no. of threads configuration in order to find most performant implementation as of now.
From the benchmarks data, the parallel branch looks more performant than all other branches with less no. of threads.
Simulations for benchmarks has been run on Gondor.
number_of_threads
is the new field added in config file.
Use par_chunks instead of par_iter with different chunk sizes and compare the performance.
For running simulations in parallel, we are using a parallel iterator from the rayon
crate. It distributes the work between the threads using a stealing mechanism.
Thinking if we could find some ideal chunk size and distribute the work between the threads using those chunks, it would be more efficient. We ran a few benchmarks for both the implementations.
From the benchmarks data, there is no significant difference in performance using parallel chunks. The best performance we get for ideal chunk size is almost the same as parallel iterator one. We did profiling using perf and generated the cpu flame graph for both. There is no difference in both implementations in terms of cpu consumption.
Simulations for benchmarks have been run on local (macbook) in mini_epirust.