A Bytecode-based Approach for Smart Contract Classification

With the development of blockchain technologies, the number of smart contracts deployed on blockchain platforms is growing exponentially, which makes it difficult for users to find desired services by manual screening. The automatic classification of smart contracts can provide blockchain users with keyword-based contract searching and helps to manage smart contracts effectively. Current research on smart contract classification focuses on Natural Language Processing (NLP) solutions which are based on contract source code. However, more than 94% of smart contracts are not open-source, so the application scenarios of NLP methods are very limited. Meanwhile, NLP models are vulnerable to adversarial attacks. This paper proposes a classification model based on features from contract bytecode instead of source code to solve these problems. We also use feature selection and ensemble learning to optimize the model. Our experimental studies on over 3,300 real-world Ethereum smart contracts show that our model can classify smart contracts without source code and has better performance than baseline models. Our model also has good resistance to adversarial attacks compared with NLP-based models. In addition, our analysis reveals that account features used in many smart contract classification models have little effect on classification and can be excluded.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Vitalik Buterin A NEXT GENERATION SMART CONTRACT & DECENTRALIZED APPLICATION PLATFORM , 2015 .

[3]  Massimo Bartoletti,et al.  A Survey of Attacks on Ethereum Smart Contracts (SoK) , 2017, POST.

[4]  Prateek Saxena,et al.  Making Smart Contracts Smarter , 2016, IACR Cryptol. ePrint Arch..

[5]  Massimo Bartoletti,et al.  Data Mining for Detecting Bitcoin Ponzi Schemes , 2018, 2018 Crypto Valley Conference on Blockchain Technology (CVCBT).

[6]  Gavin Brown,et al.  Ensemble Learning , 2010, Encyclopedia of Machine Learning and Data Mining.

[7]  Jamal Bentahar,et al.  A Blockchain-Based Model for Cloud Service Quality Monitoring , 2020, IEEE Transactions on Services Computing.

[8]  Quan Z. Sheng,et al.  Adversarial Attacks on Deep Learning Models in Natural Language Processing: A Survey , 2019 .

[9]  Xiaodong Lin,et al.  Understanding Ethereum via Graph Analysis , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[10]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[11]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[12]  Nick Szabo,et al.  Smart Contracts: Building Blocks for Digital Markets , 2018 .

[13]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[14]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[15]  Yi Zhao,et al.  Smart Contract Classification With a Bi-LSTM Based Approach , 2020, IEEE Access.

[16]  Fuchun Guo,et al.  Searchain: Blockchain-based private keyword search in decentralized storage , 2017, Future Gener. Comput. Syst..

[17]  Zhihua Cui,et al.  A Hybrid BlockChain-Based Identity Authentication Scheme for Multi-WSN , 2020, IEEE Transactions on Services Computing.

[18]  Marko Vukolic,et al.  Hyperledger fabric: a distributed operating system for permissioned blockchains , 2018, EuroSys.

[19]  Zibin Zheng,et al.  Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology , 2018, WWW.

[20]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[21]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[22]  Mikel Galar,et al.  Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches , 2013, Knowl. Based Syst..

[23]  Steven Lee,et al.  Anomaly Detection in Bitcoin Network Using Unsupervised Learning Methods , 2016, ArXiv.

[24]  Yuming Zhou,et al.  A novel ensemble method for classifying imbalanced data , 2015, Pattern Recognit..

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  James Bailey,et al.  A Novel Scalable Multi-class ROC for Effective Visualization and Computation , 2010, PAKDD.

[27]  Randal S. Olson,et al.  Relief-Based Feature Selection: Introduction and Review , 2017, J. Biomed. Informatics.

[28]  Omer Rana,et al.  Tracking GDPR Compliance in Cloud-Based Service Delivery , 2020, IEEE Transactions on Services Computing.

[29]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[30]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..