Substreams to Subgraphs high-speed injector

This tool enables high-speed writes from a Substreams-based Subgraph and is 100% compatible with graph-node's injection behavior (writing to postgres), including proofs of indexing.

It is an optional injection method that trades off high-speed injection for slightly more involved devops work.

High-speed here means one or two orders of magnitude faster.

(Repository was previously named substreams-sink-graphcsv)

Requirement

A Substreams package with a map module outputting proto:substreams.entity.v1.EntityChanges.
- By convention, we use the module name graph_out
- The substreams-entity-change crate, contains the Rust objects and helpers to accelerate development of Substreams-based Subgraphs.

Install

go install github.com/streamingfast/substreams-graph-load/cmd/graphload@latest

Use it to quickly fill your subgraph database

Producing CSV from substreams data

Note: to connect to substreams you will need an authentication token, follow this guide

Determine highest block that you want to index, aligned with your "bundle size" If your bundle size is 10000 and the ethereum mainnet head block is at 17238813, your STOP_BLOCK will be 17230000.
Write the entities to disk, from substreams

graphload run --chain-id=ethereum/mainnet --graphsql-schema=/path/to/schema.graphql --bundle-size=10000 /tmp/substreams-entities mainnet.eth.streamingfast.io:443 ./substreams-v0.0.1.spkg graph_out 17230000

Produce the CSV files based on an already-processed dump of entities:

for entity in $(graphload list-entities /path/to/schema.graphql); do 
    graphload tocsv /tmp/susbtreams-entities /tmp/substreams-csv $entity 17230000 --bundle-size=10000 --graphql-schema=/path/to/schema.graphql
done

Verify that all CSV files were produced (from start-block rounded-down to bundle-size to the stop-block)

ls /tmp/substreams-csv/*

Preparing your graph-node for injection

stop indexing on your node

graphman -c /etc/graph-node/config.toml unassign QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr

truncate the data in your newly deployed subgraph by using graphman rewind or sql> truncate

# for a subgraph/substreams that start at block `12369620`

## Before graphman commit b3e8ad1c1b2446c36b93a47b301bceca69f71dca

# Selecting the block and its hash (from your favorite block explorer) that is one block BELOW the actual startblock
graphman -c /etc/graph-node/config.toml rewind 0x6a3bb2ef0a20f5503495238e54fef236659f56f1c57e1602b0de2b3d799fe154 12369620 QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr --force

# If your subgraph starts at block 0, you cannot use this 'rewind' technique. You will have to call `truncate` on each of the tables from a Postgresql shell.

## After graphman commit b3e8ad1c1b2446c36b93a47b301bceca69f71dca
graphman -c /etc/graph-node/config.toml --start-block QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr

If you are removing indexes, now is the time to do it (see section Postgresql indexes speedup)

Injecting into postgres and restarting graph-node indexing:

Inject the csv files into postgres. List the files in /tmp/substreams/csv/{entity}

for entity in $(graphload list-entities /path/to/schema.graphql); do 
    graphload inject-csv QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr /tmp/substreams-csv $entity /path/to/schema.graphql 'postgresql://user:[email protected]:5432/database' 12360000 17230000
done

Inform graph-node of the latest indexed block:

graphload handoff QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr 0x0bdf3e2805450d917fbedb4d6f930d34261c3189eb14274e0b113302b28e59fe 17229999 'postgresql://user:[email protected]:5432/database'

If you removed indexes, now you is the time to create them (see section Postgresql indexes speedup)
Restart graph-node indexing:

graphman -c /etc/graph-node/config.toml reassign QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr default

Postgresql indexes speedup

The inject-csv command can run even faster if the indexes have been dropped from postgresql. This is especially interesting for big datasets.

Here are a few hints about how to proceed:

dropping indexes (before injection)

Warning You need to get the DDL of the indexes, using pgdump or whatever tool

for entity in $(graphload list-entities /path/to/schema.graphql); do
    graphman -c /etc/graph-node/config.toml index list QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr $entity |grep -v -- '^-' > ${entity}.indexes
    for idx in $(awk '/^[a-z]/ {print $1}' ${entity}.indexes); do 
        graphman -c /etc/graph-node/config.toml index drop QmABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqr $idx
    done
done

creating indexes (after injection)

Info You need to craft those DDL files yourself from previous backup with pgdump.. you're on your own!

They should look something like this, watch out the sgd34 should match the correct sgd...

-- myentity.ddl
create index pool_id_block_range_excl on "sgd34"."pool" using gist (id, block_range);
create index brin_pool on "sgd34"."pool" using brin(lower(block_range), coalesce(upper(block_range), 2147483647), vid);
create index pool_block_range_closed on "sgd34"."pool"(coalesce(upper(block_range), 2147483647)) where coalesce(upper(block_range), 2147483647) < 2147483647;
create index attr_3_0_pool_id on "sgd34"."pool" using btree("id");

Then apply them like this:

psql 'postgresql://user:[email protected]:5432/db' -f ddls/entity1.ddl
psql 'postgresql://user:[email protected]:5432/db' -f ddls/entity2.ddl
psql 'postgresql://user:[email protected]:5432/db' -f ddls/entity3.ddl
...

Extract indexes

Creating indexes with graphload command

graphload create-index ${DEPLOYMENT_HASH} create_indexes.ddl 'postgresql://${USER}:${PASSWORD}@127.0.0.1:5432/graph-node?sslmode=disable' path_to_schema_graphql_file

Deleting indexes with graphload command

graphload drop-index ${DEPLOYMENT_HASH} 'postgresql://${USER}:${PASSWORD}@127.0.0.1:5432/graph-node?sslmode=disable' path_to_schema_graphql_file

extract indexes with graphload command

graphload extract-index ${DEPLOYMENT_HASH} 'postgresql://${USER}:${PASSWORD}@127.0.0.1:5432/graph-node?sslmode=disable' path_to_schema_graphql_file

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.github/workflows		.github/workflows
bundler		bundler
cmd/substreams-sink-graph-load		cmd/substreams-sink-graph-load
csvprocessor		csvprocessor
devel		devel
poi		poi
postgres		postgres
schema		schema
sinker		sinker
stablehash		stablehash
.gitignore		.gitignore
.sfreleaser		.sfreleaser
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
types.go		types.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Substreams to Subgraphs high-speed injector

Requirement

Install

Use it to quickly fill your subgraph database

Producing CSV from substreams data

Preparing your graph-node for injection

Injecting into postgres and restarting graph-node indexing:

Postgresql indexes speedup

Extract indexes

Creating indexes with graphload command

Deleting indexes with graphload command

extract indexes with graphload command

About

Releases

Packages

Contributors 5

Languages

streamingfast/substreams-sink-graph-load

Folders and files

Latest commit

History

Repository files navigation

Substreams to Subgraphs high-speed injector

Requirement

Install

Use it to quickly fill your subgraph database

Producing CSV from substreams data

Preparing your graph-node for injection

Injecting into postgres and restarting graph-node indexing:

Postgresql indexes speedup

Extract indexes

Creating indexes with graphload command

Deleting indexes with graphload command

extract indexes with graphload command

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages