Data Stores & Artifacts
NoteThe goal of this page is to help you understand all the data artifacts consumed and produced by dfuse, the different databases, the different types of storages.
There are 2 data stores used by the dfuse Platform:
- Object stores, for small or large files. These use the dfuse dstore abstraction library to support Azure, GCP, AWS, Minio, and local filesystems.
- Simple key/value storage databases. These use the kvdb key/value database abstraction, with support for Google Cloud Bigtable, TiKV and Badger.
Different databases are needed for different components of
dfuse for EOSIO:
kvdb-backed key/value store, that stores block and transaction information, pre-sorted, and easily searchable by prefix or key ranges.
- This database can be written to in any order, segments by segments.
- For a 2-year old, EOS Mainnet-like deployment, this database will easily reach dozens of terabytes. Therefore, choose your backend accordingly.
- The transactions stored in this database can be filtered (docs) to save on storage.
dfuseeos toolscommand has tools to verify the integrity of such a database, to ensure a contiguous block history.
- It is written to by the trxdb-loader component.
- See parallel processing docs for more info on parallel ingestion.
kvdb-backed key/value store, that stores state changes of the blockchain state (internal, as well as contract state), like tables, rows, and snapshots of such tables. It uses purpose-built algorithms to enable snapshotting of all tables, at all block heights.
searchcomponents require a small
etcdcluster (typically 3 nodes) for service discovery between
searchcomponents. See more details about
memcached server will help a
search cluster increase performances for larger deployments.
It stores small (< 1kb) roaring bitmaps as values, with hashed normalized queries as keys.
See the search-memcached component in the documentation.
These are artifacts managed by
dfuse for EOSIO knows how to manipulate them through APIs on the node-manager or mindreader process.
This file is an append-only file. Things are written sequencially in there, only when they become irreversible. A second small database alongside the
blocks folder, called
On small low traffic networks, this will be quite tiny, a few megabytes
A small index file called
blocks/blocks.index is also append only, and stores fixed-sized pointers to offsets in
blocks.log. These two files combined, allows
nodeos to quickly fetch unexecuted block data to serve them on the p2p network upon request.
This is a mmap’d file, and stores the current live state of the blockchain, both regarding block headers, and regarding all contract’s data. This is managed by the
chain-state-db-guard-size-mb), and must be larger than what the underlying chain configuration will allow to allocate. If this is smaller, and transactions on the chain ask to write to contract storage, the node will cleanly shutdown (that’s where the
guard comes in).
This file is also a sparse file, meaning it will be allocated the full amount of what is set in
chain-state-db-size-mb, but might actually occupy far less space on the disk.
For example, after 2 years, EOS Mainnet offers more than 72GB of RAM on the RAM market, but only ~8GB are actually used by participating
nodeos nodes. The rest of the file is filled with sparse zeroes.
See the backup and recovery section for efficient ways to backup/recover those files.
portable state snapshots
These files are generated by
nodeos upon request, either through the command-line or through the
They contains all the data in the
nodeos, in a binary format that is portable, versioned and stable. It can then be used to boot a new
nodeos instance, and fill a
These are dfuse-specific artifacts.
In general, the dfuse Platform uses Protocol Buffers version 3 for serialization.
executed merged blocks files
100-blocks files, or merged blocks files, or merged bundles. These are all used interchangeably here.
They are produced by
mindreader, in catch-up mode (set as such with certain flags), or by the
merger in an HA setup. In the latter case, the
mindreader contributes one-block files to the merger instead, and the merger collates all of those in a single bundle.
These 100-blocks files can contain more than 100 blocks (because they can include multiple versions of a given block number), but not less (to ensure continuity).
The EOSIO-specific decoded Block objects are what circulate amongst all processes that work with executed block data.
These are transient files, destined to ensure that the
merger gathers all visible forks from the
mindreader instances, in an HA setup.
They contain one
bstream.Block, as serialized Protobuf (see links above).
merger will consume them, bundle them in executed blocks files (100-blocks files) and store them to
dstore storage, for consumption by most other processes.
They are produced by the
search-indexer process, and consumed by the
They contain pointers to what is stored in the
trxdb key/value store, and looked up by a transaction ID prefix.
They do not contain the actual transaction data, only the indexes to allow for fast search. They also only contain search terms specified to the
forkresolver components of
abicodec ABI cache
abicodec component primes a local cache of all the ABI changes throughout the history of the chain. It feeds itself off of a dfuse Search endpoint. Once it has that local cache, it stores it to a
dstore location, to start faster next time.
At the time of writing, this file is an opaque binary-packed format, that only abicodec can read and write.