Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(experiment): add a parallel HAMT traversal function #103

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aschmahmann
Copy link

This is an implementation of a parallel traversal of the HAMT. It's mostly copied from https://github.com/ipfs/go-unixfs/blob/323bb63cafa93c5bdd10170f35b96327760d7c1a/hamt/hamt.go#L402.

There are a few pieces of this that are obviously wrong or non-optimal (e.g. the GetMany interface, lack of options for concurrency, not caching pointers, etc.) however this seems like a good starting point for a discussion on if this PR is even wanted/acceptable for this repo.

While I don't expect lotus will make use of this function because blockchains tend to avoid parallel things, it could be useful in processing. I made some use of it recently and it was a big performance help.

cc @rvagg @frrist @ribasushi


// parallelShardWalk walks the HAMT concurrently processing callbacks upon encountering leaf nodes
func parallelShardWalk(ctx context.Context, root *Node, processShardValues func(k string, val *cbg.Deferred) error) error {
const concurrency = 16 // TODO: should be an option, also this number was basically made up with a bit of empirical testing/usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be easy enough to make this an arg to ForEachParallel, yeah? Doing so would match the behavior of ParallelDiff.

@frrist
Copy link
Member

frrist commented Jan 17, 2023

We'd make use of this logic in lily when loading things such as actor states (as you are already doing in filexp), miner sectors, miner precommits, and any other information stored in HAMTs that we need to load all at once.

Also interested in having similar logic implemented for go-amt-ipld, which could be used for loading miner sectors from their respective partitions.

@rvagg
Copy link
Member

rvagg commented Jan 18, 2023

seems fine to me, if it it gets cleaned up and some basic tests added

https://github.com/ipfs/go-unixfs/blob/323bb63cafa93c5bdd10170f35b96327760d7c1a/hamt/hamt.go#L478 is kind of like https://github.com/ipfs/go-merkledag/blob/5c067b1958df79db6a28b57d48f16f94de1f7fd8/merkledag.go#L455 in terms of interface, maybe some commonalities to be extracted from there if we're going for consistency

@aschmahmann
Copy link
Author

seems fine to me, if it it gets cleaned up and some basic tests added

Great! Any thoughts on what to do about a multiple block requesting interface here? My main concern is that it seems wasteful/excessive to do one goroutine (or channel) per block and to allow some sort of grouping.

It'll probably have to be a bit custom to deal with the IpldStore semantics expected here but overall the 3 mechanisms I've seen so far are:

  1. Just do GetBlock repeatedly using:
    a. sequential calls
    b. a goroutine per block
    c. worker pools
    d. a channel per block
  2. Grouping block calls together like GetMany and returning a channel
  3. Having a channel in for requests/cancellations and an out channel for blocks as in https://github.com/ipfs/go-bitswap/pull/593/files#diff-b751089c21e2437423d20fbee6c9a01a61c2422d5f7630ab8272d7fdeb63b654R17

If there are going to be custom IpldStore semantics should I just add an interface declaration here for now, or upstream to go-ipld-cbor and depend on that?

If we're ok with this breaking a bit/being less stable than the rest of the package I can just choose one to experiment with, define the interface here, and we can doing breaking changes as needed. WDYT?

is kind of like https://github.com/ipfs/go-merkledag/blob/5c067b1958df79db6a28b57d48f16f94de1f7fd8/merkledag.go#L455 in terms of interface, maybe some commonalities to be extracted from there if we're going for consistency

Yep, the go-merkledag code was copy-paste-modified into go-unixfs and from go-unixfs to here. The go-unixfs case is a little closer because it also has cached deserialized data but yeah same idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants