feat(experiment): add a parallel HAMT traversal function #103

aschmahmann · 2023-01-17T06:21:58Z

This is an implementation of a parallel traversal of the HAMT. It's mostly copied from https://github.com/ipfs/go-unixfs/blob/323bb63cafa93c5bdd10170f35b96327760d7c1a/hamt/hamt.go#L402.

There are a few pieces of this that are obviously wrong or non-optimal (e.g. the GetMany interface, lack of options for concurrency, not caching pointers, etc.) however this seems like a good starting point for a discussion on if this PR is even wanted/acceptable for this repo.

While I don't expect lotus will make use of this function because blockchains tend to avoid parallel things, it could be useful in processing. I made some use of it recently and it was a big performance help.

cc @rvagg @frrist @ribasushi

frrist · 2023-01-17T17:41:07Z

hamt.go

+
+// parallelShardWalk walks the HAMT concurrently processing callbacks upon encountering leaf nodes
+func parallelShardWalk(ctx context.Context, root *Node, processShardValues func(k string, val *cbg.Deferred) error) error {
+	const concurrency = 16 // TODO: should be an option, also this number was basically made up with a bit of empirical testing/usage


It should be easy enough to make this an arg to ForEachParallel, yeah? Doing so would match the behavior of ParallelDiff.

frrist · 2023-01-17T17:49:59Z

We'd make use of this logic in lily when loading things such as actor states (as you are already doing in filexp), miner sectors, miner precommits, and any other information stored in HAMTs that we need to load all at once.

Also interested in having similar logic implemented for go-amt-ipld, which could be used for loading miner sectors from their respective partitions.

rvagg · 2023-01-18T06:23:18Z

seems fine to me, if it it gets cleaned up and some basic tests added

https://github.com/ipfs/go-unixfs/blob/323bb63cafa93c5bdd10170f35b96327760d7c1a/hamt/hamt.go#L478 is kind of like https://github.com/ipfs/go-merkledag/blob/5c067b1958df79db6a28b57d48f16f94de1f7fd8/merkledag.go#L455 in terms of interface, maybe some commonalities to be extracted from there if we're going for consistency

aschmahmann · 2023-01-18T17:22:44Z

seems fine to me, if it it gets cleaned up and some basic tests added

Great! Any thoughts on what to do about a multiple block requesting interface here? My main concern is that it seems wasteful/excessive to do one goroutine (or channel) per block and to allow some sort of grouping.

It'll probably have to be a bit custom to deal with the IpldStore semantics expected here but overall the 3 mechanisms I've seen so far are:

Just do GetBlock repeatedly using:
a. sequential calls
b. a goroutine per block
c. worker pools
d. a channel per block
Grouping block calls together like GetMany and returning a channel
Having a channel in for requests/cancellations and an out channel for blocks as in https://github.com/ipfs/go-bitswap/pull/593/files#diff-b751089c21e2437423d20fbee6c9a01a61c2422d5f7630ab8272d7fdeb63b654R17

If there are going to be custom IpldStore semantics should I just add an interface declaration here for now, or upstream to go-ipld-cbor and depend on that?

If we're ok with this breaking a bit/being less stable than the rest of the package I can just choose one to experiment with, define the interface here, and we can doing breaking changes as needed. WDYT?

is kind of like https://github.com/ipfs/go-merkledag/blob/5c067b1958df79db6a28b57d48f16f94de1f7fd8/merkledag.go#L455 in terms of interface, maybe some commonalities to be extracted from there if we're going for consistency

Yep, the go-merkledag code was copy-paste-modified into go-unixfs and from go-unixfs to here. The go-unixfs case is a little closer because it also has cached deserialized data but yeah same idea.

feat(experiment): add a parallel HAMT traversal function

a9b3d6b

frrist reviewed Jan 17, 2023

View reviewed changes

This was referenced May 24, 2023

Node sink and selector suffixes support vulcanize/go-hamt-ipld#1

Merged

Node sink and selector suffixes support vulcanize/go-amt-ipld#1

Merged

i-norden mentioned this pull request Jun 12, 2023

Concurrent iterator methods for go-amt-ipld and go-hamt-ipld vulcanize/filecoin-indexing#21

Closed

This was referenced Oct 17, 2023

fix: store: expCid vs actualCid guard ipfs/go-ipld-cbor#96

Merged

feat(experiment): add a parallel HAMT traversal function (continued) #115

Open

feat(experiment): add a parallel AMT traversal function filecoin-project/go-amt-ipld#84

Open

aschmahmann mentioned this pull request Oct 25, 2023

feat: blockstore: GetMany blockstore method ipfs/boxo#492

Open

ribasushi mentioned this pull request Aug 24, 2024

feat: store: BatchOpStore (GetMany/PutMany methods) ipfs/go-ipld-cbor#97

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(experiment): add a parallel HAMT traversal function #103

feat(experiment): add a parallel HAMT traversal function #103

aschmahmann commented Jan 17, 2023

frrist Jan 17, 2023

frrist commented Jan 17, 2023

rvagg commented Jan 18, 2023

aschmahmann commented Jan 18, 2023

feat(experiment): add a parallel HAMT traversal function #103

Are you sure you want to change the base?

feat(experiment): add a parallel HAMT traversal function #103

Conversation

aschmahmann commented Jan 17, 2023

frrist Jan 17, 2023

Choose a reason for hiding this comment

frrist commented Jan 17, 2023

rvagg commented Jan 18, 2023

aschmahmann commented Jan 18, 2023