Rug-pull malicious token detection on blockchain using supervised learning with feature engineering

The rapid development of blockchain and cryptocurrency in the past decade has created a huge demand for digital trading platforms. Popular decentralised exchanges (DEXs) such as Uniswap and PancakeSwap were created to address this market gap, facilitating cryptocurrency exchange without intermediaries and hence eliminating security and privacy issues associated with traditional centralised platforms. This, however, due to lack of regulation, results in the emergence of a host of damaging investment fraudulent schemes, including Ponzi, honey pot, pump-and-dump, and rug-pull.In this study, we aim to investigate the problem of detecting rug-pull on Uniswap using supervised learning. We aggregate a list of 23 features and propose the use of a hybrid feature selection technique to find the most relevant features for rug-pull. The classifier, using this refined set of features, outperforms the classifier in the previous studies and achieves an f1-score of 99%, a precision of 97% on non-malicious tokens, and a recall of 99% on malicious tokens. Additionally, we show that the XGBoost classifier, built using these proposed features, can distinguish scam tokens and newly listed tokens, which are often harder to differentiate as they have similar characteristics, and also propose a validation method.

[1]  Bruno Mazorra,et al.  Do Not Rug on Me: Leveraging Machine Learning Techniques for Automated Scam Detection , 2022, Mathematics.

[2]  Guoai Xu,et al.  Trade or Trick? , 2021, Proc. ACM Meas. Anal. Comput. Syst..

[3]  Rachit Agarwal,et al.  Vulnerability and Transaction behavior based detection of Malicious Smart Contracts , 2021, CSS.

[4]  Xiapu Luo,et al.  SADPonzi: Detecting and Characterizing Ponzi Schemes in Ethereum Smart Contracts , 2021, Proc. ACM Meas. Anal. Comput. Syst..

[5]  Oliver Hinz,et al.  Blockchain , 2020, Bus. Inf. Syst. Eng..

[6]  Dustin Axman,et al.  Probabilistic Extension of Precision, Recall, and F1 Score for More Thorough Evaluation of Classification Models , 2020, EVAL4NLP.

[7]  Malak Abdullah,et al.  Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results , 2020, 2020 11th International Conference on Information and Communication Systems (ICICS).

[8]  Peilin Zheng,et al.  XBlock-ETH: Extracting and Exploring Blockchain Data From Ethereum , 2019, IEEE Open Journal of the Computer Society.

[9]  Li-Jia Li,et al.  Generative Modeling for Small-Data Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Yunjie Ge,et al.  Data Mining-Based Ethereum Fraud Detection , 2019, 2019 IEEE International Conference on Blockchain (Blockchain).

[11]  Manuel Lopez-Martin,et al.  Variational data generative model for intrusion detection , 2018, Knowledge and Information Systems.

[12]  A. Sánchez-Esguevillas,et al.  Variational data generative model for intrusion detection , 2018, Knowledge and Information Systems.

[13]  Zibin Zheng,et al.  Blockchain challenges and opportunities: a survey , 2018, Int. J. Web Grid Serv..

[14]  Shulin Wang,et al.  Feature selection in machine learning: A new perspective , 2018, Neurocomputing.

[15]  Zibin Zheng,et al.  Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology , 2018, WWW.

[16]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[17]  Ahmed Hamza Osman,et al.  A Novel Feature Selection Based on One-Way ANOVA F-Test for E-Mail Spam Classification , 2014 .

[18]  Mansour Sheikhan,et al.  Modular neural-SVM scheme for speech emotion recognition using ANOVA feature selection method , 2013, Neural Computing and Applications.

[19]  Sunita Beniwal,et al.  Classification and Feature Selection Techniques in Data Mining , 2012 .

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[23]  M. Shardlow An Analysis of Feature Selection Techniques , 2011 .