Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] perf from moc artifact #51

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

chenyan-dfinity
Copy link
Contributor

@github-actions
Copy link

github-actions bot commented May 3, 2023

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 196_227 2_387_017_574 9_102_052 1_293_015 689_295_883 1_224_704
triemap 201_786 2_286_509_386 9_715_900 891_925 2_111_535 1_187_671
rbtree 198_958 2_024_735_614 8_902_160 787_768 1_839_147 991_630
splay 197_868 2_305_505_782 8_702_096 1_258_328 2_175_053 1_259_845
btree 235_291 ($\textcolor{green}{-0.02\%}$) 2_116_952_097 ($\textcolor{green}{-0.25\%}$) 7_556_172 936_630 1_948_829 ($\textcolor{green}{-0.31\%}$) 991_671
zhenya_hashmap 190_491 1_855_281_618 9_301_800 745_902 1_651_310 752_198
btreemap_rs 514_775 115_994_744 1_638_400 59_433 137_855 61_795
hashmap_rs 502_772 53_333_947 1_835_008 21_070 63_601 22_484

Priority queue

Note
Same as main branch, skipping.

MoVM

Note
Same as main branch, skipping.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_432 ($\textcolor{green}{-0.02\%}$) 44_505 ($\textcolor{red}{0.13\%}$) 20_039 ($\textcolor{red}{0.58\%}$) 14_077 ($\textcolor{green}{-0.01\%}$) 16_739 ($\textcolor{red}{0.35\%}$)
Rust 940_461 541_441 102_463 125_485 137_030

DIP721 NFT

Note
Same as main branch, skipping.

Heartbeat

binary_size heartbeat
Motoko 156_821 5_324 ($\textcolor{green}{-40.70\%}$)
Rust 35_608 1_127 ($\textcolor{red}{91.99\%}$)

Timer

Note
Same as main branch, skipping.

@github-actions
Copy link

github-actions bot commented May 3, 2023

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

  • generate 50k. Insert 50k Nat32 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
  • max mem. For Motoko, it reports rts_max_live_size after generate call; For Rust, it reports the Wasm's memory page * 32Kb.
  • batch_get 50. Find 50 elements from the collection.
  • batch_put 50. Insert 50 elements to the collection.
  • batch_remove 50. Remove 50 elements from the collection.

💎 Takeaways

  • The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
  • We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an O(10000 nlogn) algorithm hitting the limit, while an O(n^2) algorithm runs just fine.
  • Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
  • Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

  • The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.
  • Due to the instrumentation overhead and cycle limit, we cannot profile computations with large collections. Hopefully, when deterministic time slicing is ready, we can measure the performance on larger memory footprint.
  • hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.
  • hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.
  • btree comes from Byron Becker's stable BTreeMap library.
  • zhenya_hashmap comes from Zhenya Usenko's stable HashMap library.
  • The MoVM table measures the performance of an experimental implementation of Motoko interpreter. External developers can ignore this table for now.

Map

binary_size generate 50k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 196_227 2_387_017_574 9_102_052 1_293_015 689_295_883 1_224_704
triemap 201_786 2_286_509_386 9_715_900 891_925 2_111_535 1_187_671
rbtree 198_958 2_024_735_614 8_902_160 787_768 1_839_147 991_630
splay 197_868 2_305_505_782 8_702_096 1_258_328 2_175_053 1_259_845
btree 235_291 2_116_952_097 7_556_172 936_630 1_948_829 991_671
zhenya_hashmap 190_491 1_855_281_618 9_301_800 745_902 1_651_310 752_198
btreemap_rs 514_775 115_994_744 1_638_400 59_433 137_855 61_795
hashmap_rs 502_772 53_333_947 1_835_008 21_070 63_601 22_484

Priority queue

binary_size heapify 50k mem pop_min 50 put 50
heap 181_726 793_253_862 1_400_024 385_321 822_756
heap_rs 473_458 5_041_433 819_200 53_243 22_092

MoVM

binary_size generate 10k max mem batch_get 50 batch_put 50 batch_remove 50
hashmap 196_227 477_464_161 1_820_844 1_291_042 138_897_096 1_222_118
hashmap_rs 502_772 10_984_247 950_272 20_385 62_907 21_374
imrc_hashmap_rs 513_980 19_919_391 1_572_864 31_519 120_207 37_618
movm_rs 2_092_441 1_017_324_475 2_654_208 2_494_635 6_477_172 5_106_080
movm_dynamic_rs 2_295_206 496_274_407 2_129_920 1_951_981 2_709_572 1_950_006

Sample Dapps

Measure the performance of some typical dapps:

  • Basic DAO,
    with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
  • DIP721 NFT

Note

  • The cost difference is mainly due to the Candid serialization cost.
  • Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.
  • We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.
  • For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

binary_size init transfer_token submit_proposal vote_proposal
Motoko 291_432 44_505 20_039 14_077 16_739
Rust 940_461 541_441 102_463 125_485 137_030

DIP721 NFT

binary_size init mint_token transfer_token
Motoko 244_622 13_379 24_678 5_358
Rust 1_005_637 144_162 375_896 94_757

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

  • setTimer measures both the setTimer(0) method and the execution of empty job.
  • It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
    of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

binary_size heartbeat
Motoko 156_821 5_324
Rust 35_608 1_127

Timer

binary_size setTimer cancelTimer
Motoko 164_347 19_476 1_907
Rust 524_361 55_152 10_417

@ggreif
Copy link
Contributor

ggreif commented May 3, 2023

@chenyan-dfinity

against the published result from main branch

it is unclear if this refers to this repo's main branch or dfinity/motoko's master branch.
If it is the former, I would have expected the second run (baseline 0.8.8) to show less improvement.

@chenyan-dfinity
Copy link
Contributor Author

I just rerun the CI, it looks good now. It's against this repo's main branch. The main branch needs some time to update the report after the PR is merged. You probably trigger the CI too soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants