Multi-Class classification of vulnerabilities in smart contracts using AWD-LSTM, with pre-trained encoder inspired from natural language processing

Vulnerability detection and safety of smart contracts are of paramount importance because of their immutable nature. Symbolic tools like OYENTE and MAIAN are typically used for vulnerability prediction in smart contracts. As these tools are computationally expensive, they are typically used to detect vulnerabilities until some predefined invocation depth. These tools require more search time as the invocation depth increases. Since the number of smart contracts is increasing exponentially, it is difficult to analyze the contracts using these traditional tools. Recently a machine learning technique called Long Short Term Memory (LSTM) has been used for binary classification, i.e., to predict whether a smart contract is vulnerable or not. This technique requires nearly constant search time as the invocation depth increases. In the present article, we have shown a multi-class classification, where we classify a smart contract in Suicidal, Prodigal, Greedy, or Normal categories. We used Average Stochastic Gradient Descent Weight-Dropped LSTM (AWD-LSTM), which is a variant of LSTM, to perform classification. We reduced the class imbalance (a large number of normal contracts as compared to other categories) by considering only the distinct opcode combination for normal contracts. We have achieved a weighted average Fbeta score of 90.0%. Hence, such techniques can be used to analyze a large number of smart contracts and help to improve the security of these contracts.

[1]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Prateek Saxena,et al.  Making Smart Contracts Smarter , 2016, IACR Cryptol. ePrint Arch..

[3]  Xiaotie Deng,et al.  Empirical analysis: stock market prediction via extreme learning machine , 2014, Neural Computing and Applications.

[4]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[5]  Yew-Soon Ong,et al.  Towards Safer Smart Contracts: A Sequence Learning Approach to Detecting Security Threats. , 2018 .

[6]  Richard Socher,et al.  Regularizing and Optimizing LSTM Language Models , 2017, ICLR.

[7]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Yoshihiko Mochizuki,et al.  Surface object recognition with CNN and SVM in Landsat 8 images , 2015, 2015 14th IAPR International Conference on Machine Vision Applications (MVA).

[9]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[10]  Gianluca Stringhini,et al.  Tiresias: Predicting Security Events Through Deep Learning , 2018, CCS.

[11]  Prateek Saxena,et al.  Finding The Greedy, Prodigal, and Suicidal Contracts at Scale , 2018, ACSAC.

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Zhong Chen,et al.  ReGuard: Finding Reentrancy Bugs in Smart Contracts , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[14]  Feifei Li,et al.  DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning , 2017, CCS.

[15]  Vitalik Buterin A NEXT GENERATION SMART CONTRACT & DECENTRALIZED APPLICATION PLATFORM , 2015 .

[16]  Sourav Sengupta,et al.  Towards Safer Smart Contracts: A Sequence Learning Approach to Detecting Vulnerabilities , 2018, ArXiv.

[17]  Yue Liu,et al.  Materials discovery and design using machine learning , 2017 .

[18]  Dinggang Shen,et al.  Machine Learning in Medical Imaging , 2012, Lecture Notes in Computer Science.

[19]  Zibin Zheng,et al.  Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology , 2018, WWW.

[20]  Vivek Vaidya,et al.  Lung nodule detection in CT using 3D convolutional neural networks , 2017, 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017).

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Nick Szabo,et al.  Smart Contracts: Building Blocks for Digital Markets , 2018 .

[23]  B. Erickson,et al.  Machine Learning for Medical Imaging. , 2017, Radiographics : a review publication of the Radiological Society of North America, Inc.

[24]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[25]  Alok Choudhary,et al.  A predictive machine learning approach for microstructure optimization and materials design , 2015, Scientific Reports.

[26]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[27]  Dawn Xiaodong Song,et al.  Recognizing Functions in Binaries with Neural Networks , 2015, USENIX Security Symposium.

[28]  Stefano Bistarelli,et al.  Analysis of Ethereum Smart Contracts and Opcodes , 2019, AINA.

[29]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[30]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[31]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .