Unsupervised Approaches to Detecting Anomalous Behavior in the Bitcoin Transaction Network

Bitcoin is an electronic crypto-currency created in 2008 by Satoshi Nakamoto (pseudonym). At the time the original bitcoin client was written, the idea of a purely peer-to-peer (P2P) digital currency which did not require a trusted-thirdparty to confirm transactions / prevent double spending was unique. In the bitcoin network, all transactions are public, effectively rendering double-spending impossible. A criminal who wishes to double-spend or falsify some segment of the transaction history must convince the majority of the bitcoin network that his transaction history is correct, but in order to do that, he must provide the appropriate proof of work. Under the assumption that the majority of the network is honest, the criminal would have to have more computational power than the majority of the network in order to falsify the transaction history, as described in [1]. Since the onset of bitcoin, several other crypto-currencies have sprung into existence, but bitcoin continues to be the most popular. Because transactions in the bitcoin network are specified by the public keys of the payer and payee, some level of anonymity is guaranteed provided public keys are not traceable to real-world identities. For criminal organizations and others using bitcoin which require strong anonymity, this is not enough, so a so-called “mixing service” is employed. The mixing service takes in bitcoins from a group of individuals requiring strong anonymity, sends the coins around randomly in an attempt to obfuscate their origins, and then sends similar amounts of bitcoins back to new addresses specified by the individuals using the service. This is discussed in more detail in [2]. For our CS229 project, we were interested in using machine learning techniques to explore a dataset of bitcoin transactions; in particular, we were interested in exploring the anonymity guarantees of the bitcoin network. The questions we were hoping to answer are: