Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding the hash of a program in Fuel VM: hashing bytecode or use MAST root? #794

Open
partylikeits1983 opened this issue Jul 24, 2024 · 1 comment

Comments

@partylikeits1983
Copy link

I am curious how the hash for a fuel predicate is computed. As I understand, the predicate hash is computed here:

pub fn root_from_code<B>(bytes: B) -> Bytes32

using the root_from_code function:

    pub fn root_from_code<B>(bytes: B) -> Bytes32
    where
        B: AsRef<[u8]>,
    {
        let mut tree = BinaryMerkleTree::new();
        bytes.as_ref().chunks(LEAF_SIZE).for_each(|leaf| {
            // If the bytecode is not a multiple of LEAF_SIZE, the final leaf
            // should be zero-padded rounding up to the nearest multiple of 8
            // bytes.
            let len = leaf.len();
            if len == LEAF_SIZE || len % MULTIPLE == 0 {
                tree.push(leaf);
            } else {
                let padding_size = len.next_multiple_of(MULTIPLE);
                let mut padded_leaf = [PADDING_BYTE; LEAF_SIZE];
                padded_leaf[0..len].clone_from_slice(leaf);
                tree.push(padded_leaf[..padding_size].as_ref());
            }
        });

        tree.root().into()
    }

Looking at this implementation however, I was wondering why use a merkle tree to compute the hash of the predicate? In this implementation, the compiled bytecode of the program is passed in, divided into chunks, inserted into the merkle tree, and then the root of the merkle tree is found.

However, using a merkle tree to find the hash of a program is very similar to the concept of a merkleized abstract syntax tree (MAST). However, a MAST root, is the hash of an entire program, where the leaves of the merkleized abstract syntax tree are the "subprograms" of the entire program. MAST is used in bitcoin: https://github.com/bitcoin/bips/blob/master/bip-0114.mediawiki

My question is why is a merkle tree used to compute the hash of a fuel program? It seems that the function root_from_code does not need to use a merkle tree at all since all it does is divide the byte code into chunks, insert into a merkle tree, and then returns the root of the merkle tree. In this implementation, the bytecode in a single leaf could be from different logic flows in a program.

Why not just use a standard hash function like keccak256 for computing the hash of a program? Why use a merkle tree in this case?

@partylikeits1983
Copy link
Author

Looking through the documentation again, I noticed this:

"[The predicate root] 'is the Merkle root of the binary Merkle tree each leaf being 16KiB of instructions.'"

https://github.com/FuelLabs/fuel-specs/blob/master/src/identifiers/predicate-id.md

However, it seems that the root_from_code function doesn't insert each leaf as 16KiB of instructions, but 16KiB of compiled bytecode.

How could this potentially cause an issue? Different compiler versions could output different compiled bytecode for the same program, meaning the same predicate compiled with different compiler versions could have two different predicate roots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant