Detection of illicit accounts over the Ethereum blockchain

Abstract The recent technological advent of cryptocurrencies and their respective benefits have been shrouded with a number of illegal activities operating over the network such as money laundering, bribery, phishing, fraud, among others. In this work we focus on the Ethereum network, which has seen over 400 million transactions since its inception. Using 2179 accounts flagged by the Ethereum community for their illegal activity coupled with 2502 normal accounts, we seek to detect illicit accounts based on their transaction history using the XGBoost classifier. Using 10 fold cross-validation, XGBoost achieved an average accuracy of 0.963 ( ± 0.006) with an average AUC of 0.994 ( ± 0.0007). The top three features with the largest impact on the final model output were established to be ‘Time diff between first and last (Mins)’, ‘Total Ether balance’ and ‘Min value received’. Based on the results we conclude that the proposed approach is highly effective in detecting illicit accounts over the Ethereum network. Our contribution is multi-faceted; firstly, we propose an effective method to detect illicit accounts over the Ethereum network; secondly, we provide insights about the most important features; and thirdly, we publish the compiled data set as a benchmark for future related works.

[1]  Massimo Bartoletti,et al.  Dissecting Ponzi schemes on Ethereum: identification, analysis, and impact , 2017, Future Gener. Comput. Syst..

[2]  Florencio Lopez-de-Silanes,et al.  Money Laundering and its Regulation , 2007 .

[3]  Qingju Wang,et al.  When Intrusion Detection Meets Blockchain Technology: A Review , 2018, IEEE Access.

[4]  Detecting Patterns in the Ethereum Transactional Data using Unsupervised Learning , 2018 .

[5]  Plato The Ring of Gyges , 2022, Notes From the Crawl Room.

[6]  Hanna Krasnova,et al.  Bitcoin: Drivers and Impediments , 2017 .

[7]  Marcela Perrone-Bertolotti,et al.  Machine learning–XGBoost analysis of language networks to classify patients with epilepsy , 2017, Brain Informatics.

[8]  Jason Hirshman,et al.  Unsupervised Approaches to Detecting Anomalous Behavior in the Bitcoin Transaction Network , 2013 .

[9]  Tyler Moore,et al.  Analyzing the Bitcoin Ponzi Scheme Ecosystem , 2018, Financial Cryptography Workshops.

[10]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[11]  David W Chambers,et al.  Ring of Gyges. , 2015, Journal of the California Dental Association.

[12]  Ralph Deters,et al.  Performance analysis of ethereum transactions in private blockchain , 2017, 2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS).

[13]  Yiik Diew Wong,et al.  A feature learning approach based on XGBoost for driving assessment and risk prediction. , 2019, Accident; analysis and prevention.

[14]  Hyun-Soo Choi,et al.  XGBoost-Based Instantaneous Drowsiness Detection Framework Using Multitaper Spectral Information of Electroencephalography , 2018, BCB.

[15]  Xiaodong Lin,et al.  Understanding Ethereum via Graph Analysis , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[16]  Kalu Ojah,et al.  Money Laundering, Tax Havens and Transparency , 2019, Enhancing Board Effectiveness.

[17]  D. Altman,et al.  Statistics Notes: Diagnostic tests 1: sensitivity and specificity , 1994, BMJ.

[18]  Christian Sturm,et al.  A Blockchain-based and resource-aware process execution engine , 2019, Future Gener. Comput. Syst..

[19]  Zibin Zheng,et al.  Exploiting Blockchain Data to Detect Smart Ponzi Schemes on Ethereum , 2019, IEEE Access.

[20]  Elaine Shi,et al.  The Ring of Gyges: Investigating the Future of Criminal Smart Contracts , 2016, CCS.

[21]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[22]  Danai Koutra,et al.  RolX: structural role extraction & mining in large graphs , 2012, KDD.

[23]  Fu Jiang,et al.  XGBoost Classifier for DDoS Attack Detection and Analysis in SDN-Based Cloud , 2018, 2018 IEEE International Conference on Big Data and Smart Computing (BigComp).

[24]  A. Worster,et al.  Understanding receiver operating characteristic (ROC) curves. , 2006, CJEM.

[25]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[26]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[27]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[28]  Marit Rudlang,et al.  Comparative Analysis of Bitcoin and Ethereum , 2017 .

[29]  Dahai Zhang,et al.  A Data-Driven Design for Fault Detection of Wind Turbines Using Random Forests and XGboost , 2018, IEEE Access.

[30]  Chris Dannen,et al.  Ponzis and Pyramids , 2018 .

[31]  Andrea Pinna,et al.  Blockchain-Oriented Software Engineering: Challenges and New Directions , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[32]  Gexiang Zhang,et al.  Cloud-assisted secure eHealth systems for tamper-proofing EHR via blockchain , 2019, Inf. Sci..

[33]  Kazuki Ikeda,et al.  Chapter Four - Applications of Blockchain in the Financial Sector and a Peer-to-Peer Global Barter Web , 2018, Adv. Comput..