Clustering Blockchain Data

Blockchain datasets, such as those generated by popular cryptocurrencies Bitcoin, Ethereum, and others, are intriguing examples of big data. Analysis of these datasets has diverse applications, such as detecting fraud and illegal transactions, characterizing major services, identifying financial hotspots, and characterizing usage and performance characteristics of large peer-to-peer consensus-based systems. Unsupervised learning methods in general, and clustering methods in particular, hold the potential to discover unanticipated patterns leading to valuable insights. However, the volume, velocity, and variety of blockchain data, as well as the difficulties in evaluating results, pose significant challenges to the efficient and effective application of clustering methods to blockchain data. Nevertheless, recent and ongoing work has adapted classic methods, as well as developed new methods tailored to the characteristics of such data. This chapter motivates the study of clustering methods for blockchain data, and introduces the key blockchain concepts from a data-centric perspective. It presents different models and methods used for clustering blockchain data, and describes the challenges and some solutions to the problem of evaluating such methods.

[1]  Siraj Raval,et al.  Decentralized Applications: Harnessing Bitcoin's Blockchain Technology , 2016 .

[2]  Michael S. Kester,et al.  Bitcoin Transaction Graph Analysis , 2015, ArXiv.

[3]  Matthew Green,et al.  Zerocoin: Anonymous Distributed E-Cash from Bitcoin , 2013, 2013 IEEE Symposium on Security and Privacy.

[4]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[5]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[6]  Qi Liu,et al.  Behavior pattern clustering in blockchain networks , 2017, Multimedia Tools and Applications.

[7]  Jerry Li,et al.  Exact Model Counting of Query Expressions , 2017, ACM Trans. Database Syst..

[8]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[9]  J. A. Cuesta-Albertos,et al.  Trimmed $k$-means: an attempt to robustify quantizers , 1997 .

[10]  Suporn Pongnumkul,et al.  Performance Analysis of Private Blockchain Platforms in Varying Workloads , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[11]  Arthur Zimek,et al.  A Framework for Clustering Uncertain Data , 2015, Proc. VLDB Endow..

[12]  Ralph C. Merkle,et al.  A Digital Signature Based on a Conventional Encryption Function , 1987, CRYPTO.

[13]  Christopher Leckie,et al.  An Evaluation of Criteria for Measuring the Quality of Clusters , 1999, IJCAI.

[14]  Christoph Fretter,et al.  The Unreasonable Effectiveness of Address Clustering , 2016, 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld).

[15]  Sooyong Park,et al.  Where Is Current Research on Blockchain Technology?—A Systematic Review , 2016, PloS one.

[16]  Vukosi N. Marivate,et al.  Unsupervised learning for robust Bitcoin fraud detection , 2016, 2016 Information Security for South Africa (ISSA).

[17]  Teuvo Kohonen,et al.  Essentials of the self-organizing map , 2013, Neural Networks.

[18]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[19]  Anna Epishkina,et al.  Discovering and Clustering Hidden Time Patterns in Blockchain Ledger , 2017, BICA 2017.

[20]  Tom Fawcett,et al.  ROC graphs with instance-varying costs , 2006, Pattern Recognit. Lett..

[21]  Julie Pasco,et al.  Glutamine Repeat Variants in Human RUNX2 Associated with Decreased Femoral Neck BMD, Broadband Ultrasound Attenuation and Target Gene Transactivation , 2012, PloS one.

[22]  Yufei Tao,et al.  DBSCAN Revisited: Mis-Claim, Un-Fixability, and Approximation , 2015, SIGMOD Conference.

[23]  Ira Assent,et al.  AnyDBC: An Efficient Anytime Density-based Clustering Algorithm for Very Large Complex Datasets , 2016, KDD.

[24]  Laura Ricci,et al.  Uncovering the Bitcoin Blockchain: An Analysis of the Full Users Graph , 2016, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[25]  A Ade Gunawan,et al.  A faster algorithm for DBSCAN , 2013 .

[26]  Daoqiang Zhang,et al.  Entropy-Inspired Competitive Clustering Algorithms , 2007, Int. J. Softw. Informatics.

[27]  Stefan Katzenbeisser,et al.  Structure and Anonymity of the Bitcoin Transaction Graph , 2013, Future Internet.

[28]  Xiaowei Xu,et al.  SCAN: a structural clustering algorithm for networks , 2007, KDD '07.

[29]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[30]  Hans-Peter Kriegel,et al.  DBSCAN Revisited, Revisited , 2017, ACM Trans. Database Syst..

[31]  Shirishkumar Patel Blockchains For Publicizing Available Scientific Datasets , 2017 .

[32]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[33]  Victor Chukwudi Osamor,et al.  Reducing the Time Requirement of k-Means Algorithm , 2012, PloS one.

[34]  Hui Xiong,et al.  K-means clustering versus validation measures: a data distribution perspective , 2006, KDD '06.

[35]  Maxim Panov,et al.  Automatic Bitcoin Address Clustering , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[36]  J. A. Bondy,et al.  Graph Theory , 2008, Graduate Texts in Mathematics.

[37]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[38]  Stefan Savage,et al.  A fistful of bitcoins: characterizing payments among men with no names , 2013, Internet Measurement Conference.

[39]  Radu State,et al.  Automated Labeling of Unknown Contracts in Ethereum , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[40]  Agostino Cortesi,et al.  Blockchain Transaction Analysis Using Dominant Sets , 2017, CISIM.

[41]  Haoyu Tan,et al.  MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data , 2013, Frontiers of Computer Science.

[42]  Jeremy Clark,et al.  SoK: Research Perspectives and Challenges for Bitcoin and Cryptocurrencies , 2015, 2015 IEEE Symposium on Security and Privacy.

[43]  Hiroki Kuzuno,et al.  Blockchain explorer: An analytical process and investigation environment for bitcoin , 2017, 2017 APWG Symposium on Electronic Crime Research (eCrime).

[44]  Anil K. Jain,et al.  Validity studies in clustering methodologies , 1979, Pattern Recognit..