Blockchain is one of the top technology trends of our times. It was originally invented by Satoshi Nakamoto as a public, distributed, transaction ledger for the cryptocurrency, Bitcoin. As part of the first phase of an ongoing project, I have been experimenting with the visualization of the Bitcoin blockchain as a graph in Neo4j. With this article, I hope to help you do the same.
Why Graph Database?
I have had reason to work with MySQL (RDBMS) only a few times. In that time, I have learnt the following:
- Relational databases have a schema which describes the structure of the database. All data in the database must conform to the schema, else the schema needs to be changed.
- Relationships between tables is defined when the tables are first created.
- This implies that structure of all future data must be known when creating the database.
- Relational databases use primary and foreign keys to determine references. The foreign key value must match the primary key value in the parent table.
- Many a time, foreign keys are auto-generated numbers. Due to this, human readable values can be extracted only by running a query that joins the tables on the primary and foreign key.
- If there is large amount of data, multiple joins can degrade performance.
A graph database:
- Requires no schema. It uses nodes and relationships to store data and the data itself determines the structure of the nodes and their relationships.
- In a RDBMS, each node would be a separate record in one table and relationships would be defined in another table.
- Relationships have names and properties which makes graph databases suitable to connected data such as social networks.
In summary, graph databases are more suitable for visualizing the Bitcoin blockchain because a graph query works only on the data within the specified relationships. On the other hand, RDBMS would have bad performance when using
JOIN on multiple tables possibly having hundreds of thousands of transaction data.
To successfully import the Bitcoin blockchain into Neo4j, the following steps need to be completed:
- Install Docker on Ubuntu VM
- I use VMware Workstation 15 Pro and Ubuntu 18.04 Desktop with the primary disk partition having maximum size of 983 GB.
- Get the docker compose file from https://gist.github.com/straumat/54edc240554f84c71b81a8d926b8f5be
- Rename the file to docker-compose.yml
- Make the following modifications in the
- Add two arguments:
- Download docker-compose
- Execute the following command in the directory in which the above docker compose file exists:
COMPOSE_HTTP_TIMEOUT=240 docker-compose up
Here are some snapshots while I was working on this:
Here are a few points on the overall process:
- It took me about 15 hours to sync up to the latest block in the Bitcoin blockchain and it’ll take me a few weeks to import all the blocks in Neo4j.
- The import into Neo4j can be interrupted at any time and it’ll resume at that point when started later.
- Analysis can be started on the already imported blocks in Neo4j. It is not necessary to wait for the entire blockchain to be imported.
- This method is useful only if you’re not in a hurry to analyze the blockchain.
Feature Image Credits: https://neo4j.com/blog/import-bitcoin-blockchain-neo4j/