Revealing the Character of Nodes in a Blockchain With Supervised Learning

The term blockchain has its roots in cryptocurrencies. However, its applications are now more widespread, and in many areas, this technology has become the foundation of the distributed ledger. The blockchain protocol assumes that all the participants of the system are both contributors and safeguards of this ledger, since the lack of a trusted third party requires other security precautions in order to maintain the consistency of transactions. In this work, we investigate whether for the participants of a blockchain-based system that does not require revealing the character explicitly, it can be discovered by other means. In order to verify this, we built and publicly released a dataset of nearly 9,000 addresses of nodes in the most popular cryptocurrency - Bitcoin, and then labelled them. These labels represent the character the nodes have in the network, e.g. miners or exchanges. We then developed a set of features that quantify the behaviour of nodes in the network and used supervised machine learning algorithms to find out whether the character of nodes can be revealed based on these features. Our results demonstrate, due to the F-score reaching over 95% in the best-performing algorithms, that it is hard to hide the role the node has in a blockchain-based network. These results indicate that to build trustworthy blockchain-based systems that fully comply with original blockchain assumptions, specific countermeasures are needed in order to preserve the desired level of anonymity.

[1]  Iuon-Chang Lin,et al.  A Survey of Blockchain Security Issues and Challenges , 2017, Int. J. Netw. Secur..

[2]  Taiwo Oladipupo Ayodele,et al.  Types of Machine Learning Algorithms , 2010 .

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[5]  Laura Wynter,et al.  Characterizing Entities in the Bitcoin Blockchain , 2018, 2018 IEEE International Conference on Data Mining Workshops (ICDMW).

[6]  Paul E. Utgoff,et al.  Incremental Induction of Decision Trees , 1989, Machine Learning.

[7]  Radoslaw Michalski,et al.  Combining Machine Learning and Social Network Analysis to Reveal the Organizational Structures , 2019, Applied Sciences.

[8]  Martin Ester,et al.  Spatially embedded co-offence prediction using supervised learning , 2014, KDD.

[9]  Sooyong Park,et al.  Where Is Current Research on Blockchain Technology?—A Systematic Review , 2016, PloS one.

[10]  Xiaodong Lin,et al.  Understanding Ethereum via Graph Analysis , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[11]  Nasser Alsalami,et al.  SoK: A Systematic Study of Anonymity in Cryptocurrencies , 2019, 2019 IEEE Conference on Dependable and Secure Computing (DSC).

[12]  Paulo Shakarian,et al.  Early Identification of Violent Criminal Gang Members , 2015, KDD.

[13]  Hannes Hartenstein,et al.  Short Paper: An Empirical Analysis of Blockchain Forks in Bitcoin , 2019, Financial Cryptography.

[14]  Gabriele D'Angelo,et al.  On the Ethereum blockchain structure: A complex networks theory perspective , 2019, Concurr. Comput. Pract. Exp..

[15]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[16]  Nigel Coles,et al.  It's Not What You Know-It's Who You Know that Counts. Analysing Serious Crime Groups as Social Networks , 2001 .

[17]  Yanxiang Huang,et al.  A multi-source integration framework for user occupation inference in social media systems , 2015, World Wide Web.

[18]  Katarzyna Musial,et al.  Learning in unlabeled networks - An active learning and inference approach , 2015, AI Commun..

[19]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[20]  I. Csabai,et al.  Inferring the interplay between network structure and market effects in Bitcoin , 2014, ArXiv.

[21]  T. Graepel,et al.  Private traits and attributes are predictable from digital records of human behavior , 2013, Proceedings of the National Academy of Sciences.

[22]  Dipankar Dasgupta,et al.  A survey of blockchain from security perspective , 2019, J. Bank. Financial Technol..

[23]  Hong-Ning Dai,et al.  XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum , 2020, IEEE Open Journal of the Computer Society.

[24]  Somdip Dey,et al.  Securing Majority-Attack in Blockchain Using Machine Learning and Algorithmic Game Theory: A Proof of Work , 2018, 2018 10th Computer Science and Electronic Engineering (CEEC).

[25]  József Stéger,et al.  A Bayesian approach to identify Bitcoin users , 2016, PloS one.

[26]  Zibin Zheng,et al.  Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology , 2018, WWW.

[27]  Shih-Wei Liao,et al.  An Evaluation of Bitcoin Address Classification based on Transaction History Summarization , 2019, 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).

[28]  Massimo Bartoletti,et al.  Data Mining for Detecting Bitcoin Ponzi Schemes , 2018, 2018 Crypto Valley Conference on Blockchain Technology (CVCBT).

[29]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[30]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[31]  Bjørn-Atle Reme,et al.  Deep Learning Applied to Mobile Phone Data for Individual Income Classification , 2016 .