Detecting DeFi Securities Violations from Token Smart Contract Code with Random Forest Classification

Decentralized Finance (DeFi) is a system of financial products and services built and delivered through smart contracts on various blockchains. In the past year, DeFi has gained popularity and market capitalization. However, it has also become an epicenter of cryptocurrency-related crime, in particular, various types of securities violations. The lack of Know Your Customer requirements in DeFi has left governments unsure of how to handle the magnitude of offending in this space. This study aims to address this problem with a machine learning approach to identify DeFi projects potentially engaging in securities violations based on their tokens’ smart contract code. We adapt prior work on detecting specific types of securities violations across Ethereum more broadly, building a random forest classifier based on features extracted from DeFi projects’ tokens’ smart contract code. The final classifier achieves a 99.1% F1-score. Such high performance is surprising for any classification problem, however, from further feature-level, we find a single feature makes this a highly detectable problem. Another contribution of our study is a new dataset, comprised of (a) a verified ground truth dataset for tokens involved in securities violations and (b) a set of valid tokens from a DeFi aggregator which conducts due diligence on the projects it lists. This paper further discusses the use of our model by prosecutors in enforcement efforts and connects its potential use to the wider legal context.

[1]  Eastern District,et al.  UNITED STATES DISTRICT COURT , 1999 .

[2]  Zibin Zheng,et al.  T-EDGE: Temporal WEighted MultiDiGraph Embedding for Ethereum Transaction Network Analysis , 2019, Frontiers in Physics.

[3]  Rok Blagus,et al.  Joint use of over- and under-sampling techniques and cross-validation for the development and assessment of prediction models , 2015, BMC Bioinformatics.

[4]  Xiaoming Huang,et al.  Transaction-based classification and detection approach for Ethereum smart contract , 2021, Inf. Process. Manag..

[5]  Shaojing Fu,et al.  Al-SPSD: Anti-leakage smart Ponzi schemes detection in blockchain , 2021, Inf. Process. Manag..

[6]  Yuedong Xu,et al.  SCSGuard: Deep Scam Detection for Ethereum Smart Contracts , 2021, IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[7]  Wei Cai,et al.  Decentralized Applications: The Blockchain-Empowered Software System , 2018, IEEE Access.

[8]  Mohammed Ababneh,et al.  Illicit Account Detection in the Ethereum Blockchain Using Machine Learning , 2021, 2021 International Conference on Information Technology (ICIT).

[9]  Zibin Zheng,et al.  Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology , 2018, WWW.

[10]  Bedil Karimov,et al.  Identification of Scams in Initial Coin Offerings With Machine Learning , 2021, Frontiers in Artificial Intelligence.

[11]  Ross C. Phillips,et al.  Tracing Cryptocurrency Scams: Clustering Replicated Advance-Fee and Phishing Websites , 2020, 2020 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).

[12]  Patrick W. Nutter Machine Learning Evidence: Admissibility and Weight , 2019 .

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  Maura R. Grossman,et al.  Artificial Intelligence as Evidence , 2021 .

[15]  Massimo Bartoletti,et al.  Dissecting Ponzi schemes on Ethereum: identification, analysis, and impact , 2017, Future Gener. Comput. Syst..

[16]  Yanmei Zhang,et al.  Detecting Ethereum Ponzi Schemes Based on Improved LightGBM Algorithm , 2022, IEEE Transactions on Computational Social Systems.

[17]  Lei Wu,et al.  Characterizing Code Clones in the Ethereum Smart Contract Ecosystem , 2019, Financial Cryptography.

[18]  Yujie Fan,et al.  Adversarial Reprogramming of Pretrained Neural Networks for Fraud Detection , 2021, CIKM.

[19]  Zibin Zheng,et al.  Ponzi scheme detection via oversampling-based Long Short-Term Memory for smart contracts , 2021, Knowl. Based Syst..

[20]  M. A. R. Ahad,et al.  Nurse care activity recognition: using random forest to handle imbalanced class problem , 2020, UbiComp/ISWC Adjunct.

[21]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[22]  Yunjie Ge,et al.  Data Mining-Based Ethereum Fraud Detection , 2019, 2019 IEEE International Conference on Blockchain (Blockchain).

[23]  Zain Ali,et al.  Measuring Illicit Activity in DeFi: The Case of Ethereum , 2021, Financial Cryptography Workshops.

[24]  Ellen S. Podgor Cryptocurrencies and Securities Fraud: In Need of Legal Guidance , 2019, SSRN Electronic Journal.

[25]  Timothy Perkis,et al.  Stack-based genetic programming , 1994, Proceedings of the First IEEE Conference on Evolutionary Computation. IEEE World Congress on Computational Intelligence.

[26]  Shane D. Johnson,et al.  Cryptocurrencies and future financial crime , 2022, Crime Science.

[27]  Xiapu Luo,et al.  Demystifying Scam Tokens on Uniswap Decentralized Exchange , 2021, ArXiv.

[28]  Xiapu Luo,et al.  SADPonzi: Detecting and Characterizing Ponzi Schemes in Ethereum Smart Contracts , 2021, Proc. ACM Meas. Anal. Comput. Syst..

[29]  Oliver Kramer,et al.  Acute Lymphoblastic Leukemia Classification from Microscopic Images using Convolutional Neural Networks , 2019, Lecture Notes in Bioengineering.

[30]  Wei-Tek Tsai,et al.  Blockchain-enabled fraud discovery through abnormal smart contract detection on Ethereum , 2021, Future Gener. Comput. Syst..

[31]  Omri Ross,et al.  KYC Optimization Using Distributed Ledger Technology , 2017, Bus. Inf. Syst. Eng..

[32]  Zibin Zheng,et al.  Exploiting Blockchain Data to Detect Smart Ponzi Schemes on Ethereum , 2019, IEEE Access.