Another Stable B-Tree. #90

matthewhammer · 2023-09-21T16:19:22Z

No description provided.

github-actions · 2023-09-21T16:39:52Z

Note
Diffing the performance result against the published result from main branch.
Unchanged benchmarks are omitted.

Map

Note
Same as main branch, skipping.

Priority queue

Note
Same as main branch, skipping.

Growable array

Note
Same as main branch, skipping.

Warning
Skip table 3 ## Stable structures from _out/collections/README.md, due to table shape mismatches from main branch.

Statistics

binary_size: no change
max_mem: no change
cycles: no change

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal	upgrade
Motoko	236_673	491_790	16_290	12_672	14_114 ($\textcolor{green}{-0.16\%}$)	122_439
Rust	806_537	541_266	86_052	107_287	117_056	1_686_510

DIP721 NFT

Note
Same as main branch, skipping.

Statistics

binary_size: no change
max_mem: no change
cycles: -0.16%

Heartbeat

	binary_size	heartbeat
Motoko	123_509	7_399 ($\textcolor{red}{96.89\%}$)
Rust	23_826	469 ($\textcolor{green}{-40.25\%}$)

Timer

Note
Same as main branch, skipping.

Statistics

binary_size: no change
max_mem: no change
cycles: no change

Publisher & Subscriber

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	144_583	131_443	14_660 ($\textcolor{red}{0.06\%}$)	8_456	10_539	3_669
Rust	477_393	527_108	51_497	34_484	74_218	44_132

Statistics

binary_size: no change
max_mem: no change
cycles: 0.06%

Overall Statistics

binary_size: no change
max_mem: no change
cycles: -0.05% [-0.73%, 0.64%]

github-actions · 2023-09-21T16:39:54Z

Note
The flamegraph link only works after you merge.
Unchanged benchmarks are omitted.

Collection libraries

Measure different collection libraries written in both Motoko and Rust.
The library names with _rs suffix are written in Rust; the rest are written in Motoko.
The _stable and _stable_rs suffix represents that the library directly writes the state to stable memory using Region in Motoko and ic-stable-stuctures in Rust.

We use the same random number generator with fixed seed to ensure that all collections contain
the same elements, and the queries are exactly the same. Below we explain the measurements of each column in the table:

generate 1m. Insert 1m Nat64 integers into the collection. For Motoko collections, it usually triggers the GC; the rest of the column are not likely to trigger GC.
max mem. For Motoko, it reports rts_max_heap_size after generate call; For Rust, it reports the Wasm's memory page * 64Kb; For stable benchmarks, it reports the region size of the stable memory storing the map.
batch_get 50. Find 50 elements from the collection.
batch_put 50. Insert 50 elements to the collection.
batch_remove 50. Remove 50 elements from the collection.
upgrade. Upgrade the canister with the same Wasm module. For non-stable benchmarks, the map state is persisted by serializing and deserializing states into stable memory. For stable benchmarks, the upgrade only needs to initialize the metadata, as the state is already in the stable memory.

💎 Takeaways

The platform only charges for instruction count. Data structures which make use of caching and locality have no impact on the cost.
We have a limit on the maximal cycles per round. This means asymptotic behavior doesn't matter much. We care more about the performance up to a fixed N. In the extreme cases, you may see an $O(10000 n\log n)$ algorithm hitting the limit, while an $O(n^2)$ algorithm runs just fine.
Amortized algorithms/GC may need to be more eager to avoid hitting the cycle limit on a particular round.
Rust costs more cycles to process complicated Candid data, but it is more efficient in performing core computations.

Note

The Candid interface of the benchmark is minimal, therefore the serialization cost is negligible in this measurement.

Due to the instrumentation overhead and cycle limit, we cannot profile computations with very large collections.

The upgrade column uses Candid for serializing stable data. In Rust, you may get better cycle cost by using a different serialization format. Another slowdown in Rust is that ic-stable-structures tends to be slower than the region memory in Motoko.

Different library has different ways for persisting data during upgrades, there are mainly three categories:

Use stable variable directly in Motoko: zhenya_hashmap, btree, vector

Expose and serialize external state (share/unshare in Motoko, candid::Encode in Rust): rbtree, heap, btreemap_rs, hashmap_rs, heap_rs, vector_rs

Use pre/post-upgrade hooks to convert data into an array: hashmap, splay, triemap, buffer, imrc_hashmap_rs

The stable benchmarks are much more expensive than their non-stable counterpart, because the stable memory API is much more expensive. The benefit is that they get fast upgrade. The upgrade still needs to parse the metadata when initializing the upgraded Wasm module.

hashmap uses amortized data structure. When the initial capacity is reached, it has to copy the whole array, thus the cost of batch_put 50 is much higher than other data structures.

btree comes from mops.one/stableheapbtreemap.

btree_stable comes from github.com/sardariuss.

zhenya_hashmap comes from mops.one/map.

vector comes from mops.one/vector. Compare with buffer, put has better worst case time and space complexity ($O(\sqrt{n})$ vs $O(n)$); get has a slightly larger constant overhead.

hashmap_rs uses the fxhash crate, which is the same as std::collections::HashMap, but with a deterministic hasher. This ensures reproducible result.

imrc_hashmap_rs uses the im-rc crate, which is the immutable version hashmap in Rust.

Map

	binary_size	generate 1m	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
hashmap	160_033	6_984_044_834	61_987_792	288_670	5_536_856_465	310_195	9_128_777_557
triemap	163_286	11_463_656_817	74_216_112	222_926	549_435	540_205	13_075_150_332
rbtree	157_961	5_979_230_865	57_996_000	88_905	268_573	278_339	5_771_873_746
splay	159_768	11_568_250_977	53_995_936	552_014	581_765	810_321	3_722_468_031
btree	187_709	8_224_242_624	31_103_952	277_542	384_171	429_041	2_517_935_226
zhenya_hashmap	160_321	2_201_622_488	22_773_040	48_627	61_839	70_872	2_695_441_915
btreemap_rs	494_261	1_654_113_949	27_590_656	66_889	112_603	81_249	2_401_229_430
imrc_hashmap_rs	500_199	2_407_082_660	244_973_568	32_962	163_913	98_591	5_209_975_418
hashmap_rs	487_986	403_296_624	73_138_176	17_350	21_647	20_615	957_579_445

Priority queue

	binary_size	heapify 1m	max mem	pop_min 50	put 50	pop_min 50	upgrade
heap	147_450	4_684_518_110	29_995_896	511_505	186_471	487_212	2_655_603_064
heap_rs	481_753	123_102_208	18_284_544	53_480	18_264	53_621	349_011_816

Growable array

	binary_size	generate 5k	max mem	batch_get 500	batch_put 500	batch_remove 500	upgrade
buffer	150_816	2_082_623	65_584	73_092	671_517	127_592	2_468_118
vector	152_363	1_588_260	24_520	105_191	149_932	148_094	3_837_918
vec_rs	480_829	265_643	1_376_256	12_986	25_331	21_215	2_854_587

Stable structures

	binary_size	generate 50k	max mem	batch_get 50	batch_put 50	batch_remove 50	upgrade
btree	187_709	351_889_192	1_554_092	219_328	337_463	368_143	125_807_039
btree_stable	205_968	13_079_338_699	2_621_440	10_597_031	14_476_166	28_516_711	25_129
btreemap_rs	494_261	70_231_886	2_555_904	57_208	86_708	79_740	100_477_350
btreemap_stable_rs	498_479	3_676_196_177	2_621_440	2_190_807	4_013_463	6_777_299	714_487
heap_rs	481_753	6_214_821	2_293_760	45_761	18_496	45_732	18_367_724
heap_stable_rs	469_772	240_377_401	458_752	2_038_566	209_047	2_023_426	714_446
vec_rs	480_829	2_866_842	2_293_760	12_986	14_081	13_678	16_575_110
vec_stable_rs	465_410	55_585_887	458_752	52_650	67_745	69_641	714_440

Sample Dapps

Measure the performance of some typical dapps:

Basic DAO,
with heartbeat disabled to make profiling easier. We have a separate benchmark to measure heartbeat performance.
DIP721 NFT

Note

The cost difference is mainly due to the Candid serialization cost.

Motoko statically compiles/specializes the serialization code for each method, whereas in Rust, we use serde to dynamically deserialize data based on data on the wire.

We could improve the performance on the Rust side by using parser combinators. But it is a challenge to maintain the ergonomics provided by serde.

For real-world applications, we tend to send small data for each endpoint, which makes the Candid overhead in Rust tolerable.

Basic DAO

	binary_size	init	transfer_token	submit_proposal	vote_proposal	upgrade
Motoko	236_673	491_790	16_290	12_672	14_114	122_439
Rust	806_537	541_266	86_052	107_287	117_056	1_686_510

DIP721 NFT

	binary_size	init	mint_token	transfer_token	upgrade
Motoko	194_938	466_439	22_357	4_729	65_612
Rust	820_893	210_081	324_368	81_020	1_860_416

Heartbeat / Timer

Measure the cost of empty heartbeat and timer job.

setTimer measures both the setTimer(0) method and the execution of empty job.
It is not easy to reliably capture the above events in one flamegraph, as the implementation detail
of the replica can affect how we measure this. Typically, a correct flamegraph contains both setTimer and canister_global_timer function. If it's not there, we may need to adjust the script.

Heartbeat

	binary_size	heartbeat
Motoko	123_509	7_399
Rust	23_826	469

Timer

	binary_size	setTimer	cancelTimer
Motoko	129_780	15_227	1_684
Rust	441_467	43_465	7_594

Publisher & Subscriber

Measure the cost of inter-canister calls from the Publisher & Subscriber example.

	pub_binary_size	sub_binary_size	subscribe_caller	subscribe_callee	publish_caller	publish_callee
Motoko	144_583	131_443	14_660	8_456	10_539	3_669
Rust	477_393	527_108	51_497	34_484	74_218	44_132

…Tree

Another Stable B-Tree.

ba18267

matthewhammer and others added 21 commits September 21, 2023 10:52

add canister for StableBTree

1672565

add lines to scripts.

5e3279b

add stable benchmark

5517ad1

fix

4d75426

adjust to 800k

16789ef

fix

a66115e

600k

f1ed547

fix

42cbdc5

fix

15551ee

vec_stable

ce9dc97

heap_stable

bf08e4f

readme

ef7c7db

fix

a086a5e

Merge remote-tracking branch 'origin/main' into matthew/sardariussBTree

3b6a80d

Merge remote-tracking branch 'origin/stable' into matthew/sardariussB…

35a1afe

…Tree

rename

233ced1

fix

2c197ea

fix

6f0bc84

Merge remote-tracking branch 'origin/main' into matthew/sardariussBTree

2c6ea7c

fix

8120365

fix

5045744

crusso mentioned this pull request Oct 5, 2023

A more optimized Stable B-Tree (main, unoptimized branch) #94

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Another Stable B-Tree. #90

Another Stable B-Tree. #90

matthewhammer commented Sep 21, 2023

github-actions bot commented Sep 21, 2023 •

edited

Loading

github-actions bot commented Sep 21, 2023 •

edited

Loading

Another Stable B-Tree. #90

Are you sure you want to change the base?

Another Stable B-Tree. #90

Conversation

matthewhammer commented Sep 21, 2023

github-actions bot commented Sep 21, 2023 • edited Loading

Map

Priority queue

Growable array

Statistics

Basic DAO

DIP721 NFT

Statistics

Heartbeat

Timer

Statistics

Publisher & Subscriber

Statistics

Overall Statistics

github-actions bot commented Sep 21, 2023 • edited Loading

Collection libraries

💎 Takeaways

Map

Priority queue

Growable array

Stable structures

Sample Dapps

Basic DAO

DIP721 NFT

Heartbeat / Timer

Heartbeat

Timer

Publisher & Subscriber

github-actions bot commented Sep 21, 2023 •

edited

Loading

github-actions bot commented Sep 21, 2023 •

edited

Loading