Large Chains Preparation

Note

The goal of this page is provide you with the necessary steps to boot a large chain, without waiting for linear processes that could take weeks to complete.

Overview

To better understand syncing process, one must understand how data is flowing inside the dfuse for EOSIO process.

First mindreader app starts and controls a nodeos process, one that has dfuse instrumentation built-in. It connects to nodeos stdout pipe and start to process output generated by dfuse Deep Mind code which live inside nodeos process.

This output is processed by mindreader which from that, generates “dfuse blocks” file. Once those blocks are generated, they are pushed through some communication channel to other app inside the stack:

  • fluxdb
  • trxdb-loader
  • search-indexer

The former first two (fluxdb & trxdb-loader) uses Badger, a local disk-based key-value database while the second creates on-disk searchable indexes that are used by the other search apps to serve search query & streams.

This means after “dfuse blocks” are created by mindreader app, the bottleneck that is probably hit is disk throughput. We never used that setup yet to sync any “existing” chain, we are thus even unsure how everything is going to behave together and even if this setup is able to reprocess a mid-size chain in a reasonable matter. In our testing on “local test chains”, we are able to roughly index ~30 blocks/s, and that was only for the trxdb-loader as far as I remember. This is a metric that is probably valid for fluxdb and trxdb-loader, the search-indexer probably goes much faster as it indexes chunk of 200 blocks.

Assuming all indexing apps runs at the same time, keep up a rate of 30 blocks/s, syncing a 3M blocks chain should take roughly 27 hours. This is also discarding the fact that blocks continue to be created while this happens, in 27 hours, there will be 194400 blocks created, that’s another 2 hours of reprocessing approximately and so on.

The bottleneck of reprocessing will be definitely Badger here. There is two distributed replacement that exists currently in dfuse for EOSIO: - Google Cloud BigTable - TiKV Database

There is also the possibilities to perform “batch” reprocessing, you start multiple “indexing” instances, each indexing a certain chunk of the chain, like 0 - 2M, 2M - 4M, etc. (any batch size is possible here, highly dependent on the actual chain). While all this exists, it most probably do not work with a local “Badger” instance since only one process can write on it at the same time. To work, someone would need to write a small “app” that would be the single controlling process of the Badge folder, and a new KV store abstraction would need to be written to talk to this “app”.

Why saying all? So that you know there is other possibilities to sync the chain, if you are ready to try stuff. Except Google Cloud BigTable, anything else has not been battle tested yet (we are starting to play with TiKV on some deployment though).

The rest of the document will be given per specific configuration.

dfuse for EOSIO - Out of the box experience

Notes You are invited to try those instructions on low to mid size traffic chain. It’s guaranteed to not work for example on EOS Mainnet or chain that exists since a long time, event if they are low-traffic.

Some users are currently trying this approach as we speak, and we will probably do it ourself to collect further metrics.

For this sync, we are going to split the process in 3 different phases: - Phase 1 - Launch a mindreader app only that syncs with the existing chain and create “dfuse blocks” file - Phase 2 - Add to the mix the indexing apps: fluxdb, trxdb-loader and search-indexer - Phase 3 - Starts everything else

We are going to run that in the same folder for each phase. Between each phase, we are going to start with a different config file.

In Phase 1, we only create “dfuse blocks” file. This is mostly CPU + Network bound as you are replaying the existing chain from scratch and create “dfuse blocks”. This runs until the chain is “live”, i.e. that you are processing live blocks.

In Phase 2, the mindreader app is going to continue its processing of live blocks, but we add to the mix the indexing apps. Those apps will not try to read live blocks. Instead, they will read “dfuse blocks” from the filesystem and index them on they various database/storage locations. We run that until everything is live and process live blocks from mindreader.

In Phase 3, we start everything else. Indexing apps will catch up where we terminated Phase 2, and live indexers will start to fill up with live data. All other apps are going to start also, so you will be able to query the APIs and should have a full indexed dfuse stack.

Bootstrap

mkdir ~/work/chain
cd ~/work/chain
dfuseeos init
# Answers `N` to `Do you want dfuse to run a procuder node for you?`
# Enter peer addresses (try to have a few good peers in there)

Once this is done, copy the genesis.json file inside ./mindreader folder. When finish, you should have the following structure:

$ tree

Phase 1

Create a file dfuse-phase1.yaml in your folder with the following content:

start:
  args:
  - mindreader
  flags:
    mindreader-log-to-zap: false
    mindreader-merge-and-store-directly: true
    mindreader-start-failure-handler: true
    mindreader-blocks-chan-capacity: 100000
    mindreader-restore-snapshot-name: latest

    # We suggest to set a stop block manually, because this config cannot deal with fork blocks
    # and with a stop block, "mindreader" will close automatically alone when it finish up sync up
    # to this point.
    #mindreader-stop-block-num: 3000000

Then start up the phase 1 processing with:

dfuseeos -c dfuse-phase1.yaml start

That will kick off creation of “dfuse blocks” and should sync with the chain relatively quickly depending on the size of the chain and the amount of transactions contained in the blocks.

Let it run until you see that nodeos process live blocks (or stops by itself, see Problems/Notes sections item list 3). We suggest also setting a

When you are live (see notes list item 3) , you can continue with Phase 2.

Performance Reports

Note Take all this lightly, it’s more to give rough ideas. There is also some performance bottleneck that could probably be improved over time at various location to improve those numbers (#29 for example is one).

  • Proton chain sync from 0 - 3.3M on 4 vCPUs e5620 2.4ghz xeon, 8gb ram, connected to LAN wired network took roughly 7 hours
  • Proton chain sync from 0 - 3M on 8 vCPUs, 2.6 GHz Intel Core i7, 16gb ram, connected to WiFi (and not with great signal with the router) took roughly 13 hours.
Potential Problems/Notes
  1. Due to usage of merge-and-store-directly, when mindreader quits, it must always start back at least 100 blocks back in time! This is the reason why mindreader-restore-snapshot-name: latest is used. However, this is problematic if there is no snapshot taken yet, mindreader restarts where it left off and there is will be a bundle missing. We need to improve something for this case.
  2. When you will enter Ctrl-C to terminate phase 1, it’s gonna take a while before it actually exits. Do not hit Ctrl-C again if the process has still activity, you can do tail -f dfuse-data/dfuse.log.json to monitor for progress, you should see log lines like: {"level":"info","ts":1589529926.052727,"logger":"mindreader","caller":"mindreader/mindreader.go:235","msg":"will shutdown when block count == 0","block_count":99970}.
  3. There is a problem when we reach “live” blocks since the mindreader-merge-and-store-directly cannot deals with forks. This is a minor problem because the process will actually kills itself automatically in the presence of the first fork. This means in this phase, it’s impossible to keep a “mindreader” running on live blocks (most probably, unless there is 0 fork which is possible, but more rare on “normal” circumstances). There is the possibility to set a stop block mindreader-stop-block-num: XXX in the config, that will tell mindreader to stop once it reaches this block.

Phase 2

At this point, you should have phase 1 stopped if you added a stop block parameter, or it might have quit when encountering a fork. If you are lucky and it’s still running, you should take a snapshot before exiting.

curl -X POST http://localhost:13009/v1/snapshot

Then monitor the folder dfuse-data/storage/snapshots until the snapshot is written there. Then when the snapshot is there, hit Ctrl-C to quit phase 1.

In Phase 2 we are going to still mindreader, but we are going to change it’s configuration so it’s able to deal with live blocks. For that, we will defer the block merging process to another app, the merger. The relayer app will also be started, so once indexing apps have finished indexing “historical” blocks, they will be able to be kept “warm” and process live blocks (assuming mindreader as correctly sync up to live blocks)

We will also start the fluxdb indexer (history state information), trxdb-loader (transation and blocks database) and search-indexer (search engine indexes).

Create a file dfuse-phase2.yaml in your folder with the following content:

start:
  args:
  - mindreader
  - merger
  - relayer
  - fluxdb
  - trxdb-loader
  - search-indexer
  flags:
    fluxdb-enable-server-mode: false
    mindreader-start-failure-handler: true
    mindreader-blocks-chan-capacity: 100000
    mindreader-restore-snapshot-name: latest
    relayer-max-drift: 0
    trxdb-loader-batch-size: 100

Then start up the phase 2 processing with:

dfuseeos -c dfuse-phase2.yaml start

Monitoring advancement of the indexing processed is done manually right now, extracting logs from dfuse-data/dfuse.log.json file to see what’s going on at which rate.

We uses zap-pretty tool (https://github.com/maoueh/zap-pretty) to prettify the log line. Can be replaced by jq . also but it’s less pretty and creates longer line. If you use jq . or nothing at all, you can avoid the --line-buffered in the grep statements, that is used to overcome a limitation of zap-pretty (that we will fix eventually).

# App fluxdb progress (check for `block_num` field, no insert rate given here)
tail -f dfuse-data/dfuse.log.json| grep --line-buffered "fluxdb" | grep --line-buffered "wrote irreversible segment" | zap-pretty

# App trxdb-loader (check for `block_num` and `block_sec` (insertion rate))
tail -f dfuse-data/dfuse.log.json| grep --line-buffered "trxdb" | grep --line-buffered "5sec AVG INSERT RATE" | zap-pretty

# App search-indexer  (check for `block`, `blocks_per_sec` and `docs_per_sec` (insertion rate))
tail -f dfuse-data/dfuse.log.json| grep --line-buffered "indexer" | grep --line-buffered "processing irreversible block" | zap-pretty

# Mindeader (to see at which block it is syncing)
tail -f dfuse-data/dfuse.log.json|  grep --line-buffered mindreader | zap-pretty
Performance Reports

Note Take all this lightly, it’s more to give rough ideas.

  • Proton chain sync from 0 - 3.3M on 4 vCPUs e5620 2.4ghz xeon, 8gb ram, RAID6 SAN (exact number unavailable, reported 2 hours but I image it’s much less than that). No break down per app.
  • Proton chain sync from 0 - 3M on 8 vCPUs, 2.6 GHz Intel Core i7, 16gb ram, SSD disk
    • App search-indexer took roughly 35m to index all the data
    • App trxdb-loader took roughly 55m to index all the data
    • App fluxdb took roughly 60m to index all the data