Bitcoin is a famous peer-to-peer, decentralized electronic currency system which operates with no central authority or banks; managing transactions and the issuing of bitcoins is carried out collectively by the network. Bitcoins are digital coins that can be sent through the Internet. It is a decentralized system meaning that anyone can create a wallet, buy and sell bitcoins without oversight or regulation. Bitcoin transactions, a signed message between two bitcoin addresses, do not explicitly identify the payer or payee so there is potential for fraudulent transactions. However, the complete history of all transactions ever performed in the Bitcoin network, called “blockchain”, is public and replicated on each node.
The data contained in the Bitcoin (BTC) network is difficult to analyze manually, but can yield a high number of relevant information. We want to use connected analysis to look at the BTC data. Over time, we would be able to understand and identify normal behavior patterns. One key to pattern identification is having the ability to collect, analyze and visualize data to reveal relationships within the data. Using Neo4j, we can easily model the BTC data as a graph encapsulating the relationships in the data – for example the relationships between bitcoins, transactions, blocks, and wallets. Using Neo4j’s Cypher language, we can query the data looking for patterns of activity, easily visualize the data and provide the data out to be analyzed with machine learning algorithms.
There are some fundamental differences between traditional currency and Bitcoins which can make things a bit difficult to track where Bitcoins are going. The first being that Bitcoins are not single entities, but instead transactions are done in fractions of Bitcoins. The other being that Bitcoins wallets/addresses are trivial to generate. This means someone could create a large number of Bitcoins wallets in order to move their money around. A large amount of Bitcoins could be hidden in a large number of wallets. There are also services which pool BTC into a collection of wallets with other users’ Bitcoins in order to launder Bitcoins. Given all these potential complexities, we will treat the BTC network as a graph. Bitcoins are challenging in that they are not single entities. Bitcoins can be combined from multiple sources and forwarded on as payments. Wallets and addresses are easy to create and a single user can generate multiple addresses to move money around.
Bitcoin has a video on their home page explaining how Bitcoins work. At the 0:06 mark of the video, you can easily see how the bitcoins travel across the inter-connected network.
Let’s look at how we can model the Bitcoin transactions as a graph. Blocks are connected together by a :PREVIOUS relationship. We can use this relationship to follow the connections between blocks. Each transaction is contained within a block. A block will have several transactions contained within the transaction. A transaction consists of an address REDEEMING some bitcoins as part of an IncomingTransaction. All redeemed bitcoins are tied to a previous transaction that shows how the redeeming party received those bitcoins. Redeemed Bitcoins are then sent to an address. In our model, we have created IncomingTransaction and OutgoingTransaction nodes which allow us to easily count and sum the Incoming and Outgoing transactions from an address.
In part 2 of this blog series, we will discuss how we obtained a set of data, look at the data structure and talk about how to load the data into Neo4j.