Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning

Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for reducing the anonymity of the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilised a sample of 434 entities (with ≈ 200 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 10 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm, we achieve an accuracy of 77% and F1-score of ≈ 0.75. We discuss our novel approach of Supervised Machine Learning for uncovering Bitcoin Blockchain anonymity and its potential applications to forensics and financial compliance and its societal implications, outline study limitations and propose future research directions.

[1]  Syed Taha Ali,et al.  Bitcoin: Perils of an Unregulated Global P2P Currency , 2015, Security Protocols Workshop.

[2]  Patrick D. McDaniel,et al.  An Analysis of Anonymity in Bitcoin Using P2P Network Traffic , 2014, Financial Cryptography.

[3]  Chen Zhao Graph-based forensic investigation of Bitcoin transactions , 2014 .

[4]  Stefano Zanero,et al.  BitIodine: Extracting Intelligence from the Bitcoin Network , 2014, Financial Cryptography.

[5]  Eli Ben-Sasson,et al.  Zerocash: Decentralized Anonymous Payments from Bitcoin , 2014, 2014 IEEE Symposium on Security and Privacy.

[6]  Peter E. Kennedy A Guide to Econometrics , 1979 .

[7]  S A R A H M E I K L E J O H N,et al.  A Fistful of Bitcoins Characterizing Payments Among Men with No Names , 2013 .

[8]  Michael S. Kester,et al.  Bitcoin Transaction Graph Analysis , 2015, ArXiv.

[9]  Ghassan O. Karame,et al.  Evaluating User Privacy in Bitcoin , 2013, Financial Cryptography.

[10]  James Martin,et al.  Lost on the Silk Road: Online drug distribution and the ‘cryptomarket’ , 2014 .

[11]  T. Moore,et al.  Bitcoin: Economics, Technology, and Governance , 2014 .

[12]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[13]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[14]  Nicolas Christin,et al.  Traveling the silk road: a measurement analysis of a large anonymous online marketplace , 2012, WWW.

[15]  Jonas David Nick,et al.  Data-Driven De-Anonymization in Bitcoin , 2015 .

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Henrik Karlstrøm Do libertarians dream of electric coins? The material embeddedness of Bitcoin , 2014 .

[18]  Jason Hirshman,et al.  Unsupervised Approaches to Detecting Anomalous Behavior in the Bitcoin Transaction Network , 2013 .

[19]  Elaine Shi,et al.  Bitter to Better - How to Make Bitcoin a Better Currency , 2012, Financial Cryptography.

[20]  Adi Shamir,et al.  Quantitative Analysis of the Full Bitcoin Transaction Graph , 2013, Financial Cryptography.

[21]  Fergal Reid,et al.  An Analysis of Anonymity in the Bitcoin System , 2011, PASSAT 2011.

[22]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Sarah Meiklejohn,et al.  Privacy-Enhancing Overlays in Bitcoin , 2015, Financial Cryptography Workshops.

[24]  M. Van Hout,et al.  'Silk Road', the virtual drug marketplace: a single case study of user experiences. , 2013, The International journal on drug policy.

[25]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[26]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[27]  Adam Doupé,et al.  Behind closed doors: measurement and analysis of CryptoLocker ransoms in Bitcoin , 2016, 2016 APWG Symposium on Electronic Crime Research (eCrime).

[28]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[29]  Marie Claire Van Hout,et al.  Drugs on the dark net: how cryptomarkets are transforming the global trade in illicit drugs , 2015 .

[30]  Jeremy Clark,et al.  Mixcoin: Anonymity for Bitcoin with Accountable Mixes , 2014, Financial Cryptography.