Regulating Cryptocurrencies: A Supervised Machine Learning Approach to De-Anonymizing the Bitcoin Blockchain

Abstract Bitcoin is a cryptocurrency whose transactions are recorded on a distributed, openly accessible ledger. On the Bitcoin Blockchain, an owning entity’s real-world identity is hidden behind a pseudonym, a so-called address. Therefore, Bitcoin is widely assumed to provide a high degree of anonymity, which is a driver for its frequent use for illicit activities. This paper presents a novel approach for de-anonymizing the Bitcoin Blockchain by using Supervised Machine Learning to predict the type of yet-unidentified entities. We utilized a sample of 957 entities (with ≈385 million transactions), whose identity and type had been revealed, as training set data and built classifiers differentiating among 12 categories. Our main finding is that we can indeed predict the type of a yet-unidentified entity. Using the Gradient Boosting algorithm with default parameters, we achieve a mean cross-validation accuracy of 80.42% and F1-score of ≈79.64%. We show two examples, one where we predict on a set of 22 clusters that are suspected to be related to cybercriminal activities, and another where we classify 153,293 clusters to provide an estimation of the activity on the Bitcoin ecosystem. We discuss the potential applications of our method for organizational regulation and compliance, societal implications, outline study limitations, and propose future research directions. A prototype implementation of our method for organizational use is included in the appendix.

[1]  Kai Zimmermann,et al.  Bitcoin - Asset or Currency? Revealing Users' Hidden Intentions , 2014, ECIS.

[2]  Elaine Shi,et al.  Bitter to Better - How to Make Bitcoin a Better Currency , 2012, Financial Cryptography.

[3]  A. Froomkin Flood Control on the Information Ocean: Living With Anonymity, Digital Cash, and Distributed Databases , 1996 .

[4]  Kilian Stoffel,et al.  Theoretical Comparison between the Gini Index and Information Gain Criteria , 2004, Annals of Mathematics and Artificial Intelligence.

[5]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[6]  C. Faloutsos,et al.  Ensemble Methods , 2019, Machine Learning with Spark™ and Python®.

[7]  Garry A. Gabison Policy Considerations for the Blockchain Technology Public and Private Applications , 2016 .

[8]  Jared A. Kleiman Beyond the Silk Road: Unregulated Decentralized Virtual Currencies Continue to Endanger US National Security and Welfare , 2013 .

[9]  Jonathan Turpin,et al.  Bitcoin: The Economic Case for a Global, Virtual Currency Operating in an Unexplored Legal Framework , 2014 .

[10]  Marten Risius,et al.  A Blockchain-Based Approach Towards Overcoming Financial Fraud in Public Sector Services , 2017, Bus. Inf. Syst. Eng..

[11]  Larissa Lee,et al.  New Kids on the Blockchain: How Bitcoin's Technology Could Reinvent the Stock Market , 2016 .

[12]  Seung-Hyun Kim,et al.  Cybercrime Deterrence and International Legislation: Evidence from Distributed Denial of Service Attacks , 2017, MIS Q..

[13]  Ralph C. Merkle,et al.  Protocols for Public Key Cryptosystems , 1980, 1980 IEEE Symposium on Security and Privacy.

[14]  Antoinette L. Smith,et al.  Are You Ready for Digital Currency , 2014 .

[15]  Misha Tsukerman The Block Is Hot: A Survey of the State of Bitcoin Regulation and Suggestions for the Future , 2015 .

[16]  Yury Yanovich,et al.  Converging blockchain and next-generation artificial intelligence technologies to decentralize and accelerate biomedical research and healthcare , 2015, Oncotarget.

[17]  Alain Pinsonneault,et al.  Anonymity in Group Support Systems Research: A New Conceptualization, Measure, and Contingency Framework , 1997, J. Manag. Inf. Syst..

[18]  Jeremy Clark,et al.  Mixcoin: Anonymity for Bitcoin with Accountable Mixes , 2014, Financial Cryptography.

[19]  Qing Bai,et al.  How Does Social Media Impact Bitcoin Value? A Test of the Silent Majority Hypothesis , 2018, J. Manag. Inf. Syst..

[20]  M. Van Hout,et al.  Responsible vendors, intelligent consumers: Silk Road, the online revolution in drug trading. , 2014, The International journal on drug policy.

[21]  David H. Wolpert,et al.  Coevolutionary free lunches , 2005, IEEE Transactions on Evolutionary Computation.

[22]  Danton Bryans,et al.  Bitcoin and Money Laundering: Mining for an Effective Solution , 2013 .

[24]  David Godes,et al.  Introduction to the Special Issue - Social Media and Business Transformation: A Framework for Research , 2013, Inf. Syst. Res..

[25]  Ghassan O. Karame,et al.  Evaluating User Privacy in Bitcoin , 2013, Financial Cryptography.

[26]  M. Van Hout,et al.  'Silk Road', the virtual drug marketplace: a single case study of user experiences. , 2013, The International journal on drug policy.

[27]  Nicholas J. Ajello Fitting a Square Peg in a Round Hole: Bitcoin, Money Laundering, and the Fifth Amendment Privilege Against Self-Incrimination , 2015 .

[28]  Elizabeth Sara Ross Nobody Puts Blockchain in a Corner: The Disruptive Role of Blockchain Technology in the Financial Services Industry and Current Regulatory Issues , 2017 .

[29]  Trevor Kiviat,et al.  Beyond Bitcoin: Issues in Regulating Blockchain Transactions , 2015 .

[30]  Carla L. Reyes,et al.  Conceptualizing Cryptolaw , 2017 .

[31]  Daniel J. Veit,et al.  Beyond the Personalization–Privacy Paradox: Privacy Valuation, Transparency Features, and Service Personalization , 2017, J. Manag. Inf. Syst..

[32]  Roman Beck,et al.  Beyond Bitcoin: The Rise of Blockchain World , 2018, Computer.

[33]  Andreas Pfitzmann,et al.  Anonymity, Unobservability, and Pseudonymity - A Proposal for Terminology , 2000, Workshop on Design Issues in Anonymity and Unobservability.

[34]  Fergal Reid,et al.  An Analysis of Anonymity in the Bitcoin System , 2011, PASSAT 2011.

[35]  Henrik Karlstrøm Do libertarians dream of electric coins? The material embeddedness of Bitcoin , 2014 .

[36]  A. Michael Froomkin,et al.  Legal Issues in Anonymity and Pseudonymity , 1999, Inf. Soc..

[37]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[38]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[39]  Patrick D. McDaniel,et al.  An Analysis of Anonymity in Bitcoin Using P2P Network Traffic , 2014, Financial Cryptography.

[40]  Roman Beck,et al.  Blockchain - the Gateway to Trust-Free Cryptographic Transactions , 2016, ECIS.

[41]  Eli Ben-Sasson,et al.  Zerocash: Decentralized Anonymous Payments from Bitcoin , 2014, 2014 IEEE Symposium on Security and Privacy.

[42]  Tom C. W. Lin Compliance, Technology, and Modern Finance , 2016 .

[43]  Adam Back,et al.  Hashcash - A Denial of Service Counter-Measure , 2002 .

[44]  Arvind Narayanan,et al.  When the cookie meets the blockchain: Privacy risks of web payments via cryptocurrencies , 2017, Proc. Priv. Enhancing Technol..

[45]  T. Moore,et al.  Bitcoin: Economics, Technology, and Governance , 2014 .

[46]  Sarah Meiklejohn,et al.  Privacy-Enhancing Overlays in Bitcoin , 2015, Financial Cryptography Workshops.

[47]  On Blockchain Auditability , 2016 .

[48]  Adi Shamir,et al.  Quantitative Analysis of the Full Bitcoin Transaction Graph , 2013, Financial Cryptography.

[49]  Jehoshua Bruck,et al.  Highly Available Distributed Storage Systems , 1998, Wide Area Networks and High Performance Computing.

[50]  Ahmed Abbasi,et al.  MetaFraud: A Meta-Learning Framework for Detecting Financial Fraud , 2012, MIS Q..

[51]  Kai Spohrer,et al.  A Blockchain Research Framework , 2017, Business & Information Systems Engineering.

[52]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[53]  Wanda J. Orlikowski,et al.  Entanglements in Practice: Performing Anonymity Through Social Media , 2014, MIS Q..

[54]  Steve Young Enforcing Constitutional Rights Through Computer Code , 2018 .

[55]  Bart Baesens,et al.  Analytics in a Big Data World: The Essential Guide to Data Science and its Applications , 2014 .

[56]  Scott J. Shackelford,et al.  Block-by-Block: Leveraging the Power of Blockchain Technology to Build Trust and Promote Cyber Peace , 2016 .

[57]  Benoît Otjacques,et al.  Interoperability of E-Government Information Systems: Issues of Identification and Data Sharing , 2007, J. Manag. Inf. Syst..

[58]  Jay F. Nunamaker,et al.  Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System , 2016, J. Manag. Inf. Syst..

[59]  James Martin,et al.  Lost on the Silk Road: Online drug distribution and the ‘cryptomarket’ , 2014 .

[60]  Marie Claire Van Hout,et al.  Drugs on the dark net: how cryptomarkets are transforming the global trade in illicit drugs , 2015 .

[61]  Andrew Kang,et al.  Understanding and Regulating Twenty-First Century Payment Systems: The Ripple Case Study , 2016, Michigan Law Review.

[62]  Ravikiran Vatrapu,et al.  Breaking Bad: De-Anonymising Entity Types on the Bitcoin Blockchain Using Supervised Machine Learning , 2018, HICSS.

[63]  J. Bradford DeLong,et al.  Speculative Microeconomics for Tomorrow's Economy , 2000, First Monday.

[64]  Florian Glaser,et al.  Beyond Cryptocurrencies - A Taxonomy of Decentralized Consensus Systems , 2015, ECIS.

[65]  Saul Levmore The Internet's Anonymity Problem , 2008 .

[66]  Frances M. T. Brazier,et al.  Anonymity and software agents: An interdisciplinary challenge , 2004, Artificial Intelligence and Law.

[67]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[68]  A. Michael Froomkin,et al.  Anonymity and its Enmities , 1995 .

[69]  Ravikiran Vatrapu,et al.  A first estimation of the proportion of cybercriminal entities in the bitcoin ecosystem using supervised machine learning , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[70]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[71]  Joshua S. Morgan What I Learned Trading Cryptocurrencies While Studying the Law , 2018 .

[72]  Jay F. Nunamaker,et al.  Autonomous Scientifically Controlled Screening Systems for Detecting Information Purposely Concealed by Individuals , 2014, J. Manag. Inf. Syst..

[73]  Proof of Stake versus Proof of Work White Paper , 2016 .

[74]  Jeremy M. Sklaroff Smart Contracts and the Cost of Inflexibility , 2017 .

[75]  M. Atzori Blockchain Technology and Decentralized Governance: Is the State Still Necessary? , 2017 .

[76]  Philip Godsiff,et al.  Bitcoin: Bubble or Blockchain , 2015, KES-AMSTA.

[77]  Arvind Narayanan,et al.  Bitcoin and Cryptocurrency Technologies - A Comprehensive Introduction , 2016 .

[78]  Jay F. Nunamaker,et al.  Detecting Fake Websites: The Contribution of Statistical Learning Theory , 2010, MIS Q..

[79]  A. Blundell-Wignall The Bitcoin Question , 2014 .

[80]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[81]  Huaiqing Wang,et al.  Design Theory for Market Surveillance Systems , 2015, J. Manag. Inf. Syst..

[82]  Syed Taha Ali,et al.  Bitcoin: Perils of an Unregulated Global P2P Currency , 2015, Security Protocols Workshop.

[83]  Jay F. Nunamaker,et al.  Enhancing Predictive Analytics for Anti-Phishing by Exploiting Website Genre Information , 2015, J. Manag. Inf. Syst..

[84]  A. Michael Froomkin,et al.  From Anonymity to Identification , 2015 .

[85]  Roman Beck,et al.  Blockchain to Rule the Waves - Nascent Design Principles for Reducing Risk and Uncertainty in Decentralized Environments , 2017, ICIS.

[86]  Ryan Surujnath,et al.  Off The Chain! A Guide to Blockchain Derivatives Markets and the Implications on Systemic Risk , 2017 .

[87]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[88]  Roman Beck,et al.  Governance in the Blockchain Economy: A Framework and Research Agenda , 2018, J. Assoc. Inf. Syst..

[89]  Jay F. Nunamaker,et al.  Stylometric Identification in Electronic Markets: Scalability and Robustness , 2008, J. Manag. Inf. Syst..

[90]  Chia-Hui Wang,et al.  Blockchain-based payment collection supervision system using pervasive Bitcoin digital wallet , 2017, 2017 IEEE 13th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob).

[91]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[92]  Nicolas Christin,et al.  Traveling the silk road: a measurement analysis of a large anonymous online marketplace , 2012, WWW.

[93]  Bin Zhang,et al.  Examining Hacker Participation Length in Cybercriminal Internet-Relay-Chat Communities , 2016, J. Manag. Inf. Syst..

[94]  Jay F. Nunamaker,et al.  Exploring Emerging Hacker Assets and Key Hackers for Proactive Cyber Threat Intelligence , 2017, J. Manag. Inf. Syst..

[95]  Florian Glaser,et al.  Pervasive Decentralisation of Digital Infrastructures: A Framework for Blockchain enabled System and Use Case Analysis , 2017, HICSS.

[96]  C. Christopher The Bridging Model: Exploring the Roles of Trust and Enforcement in Banking, Bitcoin, and the Blockchain , 2016 .

[97]  Michael S. Kester,et al.  Bitcoin Transaction Graph Analysis , 2015, ArXiv.

[98]  Andrew B. Whinston,et al.  Choice of Transaction Channels: The Effects of Product Characteristics on Market Evolution , 2005, J. Manag. Inf. Syst..

[99]  Malte Möser,et al.  An inquiry into money laundering tools in the Bitcoin ecosystem , 2013, 2013 APWG eCrime Researchers Summit.

[100]  Peter E. Kennedy A Guide to Econometrics , 1979 .

[101]  S A R A H M E I K L E J O H N,et al.  A Fistful of Bitcoins Characterizing Payments Among Men with No Names , 2013 .

[102]  Christof Weinhardt,et al.  Breaking down the Blockchain Hype - towards a Blockchain Market Engineering Approach , 2017, ECIS.

[103]  Primavera De Filippi,et al.  Decentralized Blockchain Technology and the Rise of Lex Cryptographia , 2015 .

[104]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[105]  Stefano Zanero,et al.  BitIodine: Extracting Intelligence from the Bitcoin Network , 2014, Financial Cryptography.