This tool enables high-speed writes from a Substreams-based Subgraph and is 100% compatible with graph-node
's injection behavior (writing to postgres), including proofs of indexing.
It is an optional injection method that trades off high-speed injection for slightly more involved devops work.
High-speed here means one or two orders of magnitude faster.
(Repository was previously named substreams-sink-graphcsv
)
- A Substreams package with a map module outputting
proto:substreams.entity.v1.EntityChanges
.- By convention, we use the module name
graph_out
- The substreams-entity-change crate, contains the Rust objects and helpers to accelerate development of Substreams-based Subgraphs.
- By convention, we use the module name
go install github.com/streamingfast/substreams-graph-load/cmd/graphload@latest
Note: to connect to substreams you will need an authentication token, follow this guide |
---|
-
Determine highest block that you want to index, aligned with your "bundle size" If your bundle size is 10000 and the ethereum mainnet head block is at 17238813, your
STOP_BLOCK
will be 17230000. -
Write the entities to disk, from substreams
graphload run --chain-id=ethereum/mainnet --graphsql-schema=/path/to/schema.graphql --bundle-size=10000 /tmp/substreams-entities mainnet.eth.streamingfast.io:443 ./substreams-v0.0.1.spkg graph_out 17230000
- Produce the CSV files based on an already-processed dump of entities:
for entity in $(graphload list-entities /path/to/schema.graphql); do
graphload tocsv /tmp/susbtreams-entities /tmp/substreams-csv $entity 17230000 --bundle-size=10000 --graphql-schema=/path/to/schema.graphql
done
- Verify that all CSV files were produced (from start-block rounded-down to bundle-size to the stop-block)
ls /tmp/substreams-csv/*
- stop indexing on your node
graphman -c /etc/graph-node/config.toml unassign QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr
- truncate the data in your newly deployed subgraph by using
graphman rewind
orsql> truncate
# for a subgraph/substreams that start at block `12369620`
## Before graphman commit b3e8ad1c1b2446c36b93a47b301bceca69f71dca
# Selecting the block and its hash (from your favorite block explorer) that is one block BELOW the actual startblock
graphman -c /etc/graph-node/config.toml rewind 0x6a3bb2ef0a20f5503495238e54fef236659f56f1c57e1602b0de2b3d799fe154 12369620 QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr --force
# If your subgraph starts at block 0, you cannot use this 'rewind' technique. You will have to call `truncate` on each of the tables from a Postgresql shell.
## After graphman commit b3e8ad1c1b2446c36b93a47b301bceca69f71dca
graphman -c /etc/graph-node/config.toml --start-block QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr
- If you are removing indexes, now is the time to do it (see section Postgresql indexes speedup)
- Inject the csv files into postgres. List the files in
/tmp/substreams/csv/{entity}
for entity in $(graphload list-entities /path/to/schema.graphql); do
graphload inject-csv QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr /tmp/substreams-csv $entity /path/to/schema.graphql 'postgresql://user:[email protected]:5432/database' 12360000 17230000
done
- Inform
graph-node
of the latest indexed block:
graphload handoff QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr 0x0bdf3e2805450d917fbedb4d6f930d34261c3189eb14274e0b113302b28e59fe 17229999 'postgresql://user:[email protected]:5432/database'
-
If you removed indexes, now you is the time to create them (see section Postgresql indexes speedup)
-
Restart
graph-node
indexing:
graphman -c /etc/graph-node/config.toml reassign QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr default
The inject-csv
command can run even faster if the indexes have been dropped from postgresql. This is especially interesting for big datasets.
Here are a few hints about how to proceed:
- dropping indexes (before injection)
Warning You need to get the DDL of the indexes, using pgdump or whatever tool
for entity in $(graphload list-entities /path/to/schema.graphql); do
graphman -c /etc/graph-node/config.toml index list QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr $entity |grep -v -- '^-' > ${entity}.indexes
for idx in $(awk '/^[a-z]/ {print $1}' ${entity}.indexes); do
graphman -c /etc/graph-node/config.toml index drop QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr $idx
done
done
- creating indexes (after injection)
Info You need to craft those DDL files yourself from previous backup with pgdump.. you're on your own!
They should look something like this, watch out the sgd34
should match the correct sgd...
-- myentity.ddl
create index pool_id_block_range_excl on "sgd34"."pool" using gist (id, block_range);
create index brin_pool on "sgd34"."pool" using brin(lower(block_range), coalesce(upper(block_range), 2147483647), vid);
create index pool_block_range_closed on "sgd34"."pool"(coalesce(upper(block_range), 2147483647)) where coalesce(upper(block_range), 2147483647) < 2147483647;
create index attr_3_0_pool_id on "sgd34"."pool" using btree("id");
Then apply them like this:
psql 'postgresql://user:[email protected]:5432/db' -f ddls/entity1.ddl
psql 'postgresql://user:[email protected]:5432/db' -f ddls/entity2.ddl
psql 'postgresql://user:[email protected]:5432/db' -f ddls/entity3.ddl
...
graphload create-index ${DEPLOYMENT_HASH} create_indexes.ddl 'postgresql://${USER}:${PASSWORD}@127.0.0.1:5432/graph-node?sslmode=disable' path_to_schema_graphql_file
graphload drop-index ${DEPLOYMENT_HASH} 'postgresql://${USER}:${PASSWORD}@127.0.0.1:5432/graph-node?sslmode=disable' path_to_schema_graphql_file
graphload extract-index ${DEPLOYMENT_HASH} 'postgresql://${USER}:${PASSWORD}@127.0.0.1:5432/graph-node?sslmode=disable' path_to_schema_graphql_file