Eth2Vec: Learning Contract-Wide Code Representations for Vulnerability Detection on Ethereum Smart Contracts

Ethereum smart contracts are programs that run on the Ethereum blockchain, and many smart contract vulnerabilities have been discovered in the past decade. Many security analysis tools have been created to detect such vulnerabilities, but their performance decreases drastically when codes to be analyzed are being rewritten. In this paper, we propose Eth2Vec, a machine-learning-based static analysis tool for vulnerability detection, with robustness against code rewrites in smart contracts. Existing machine-learning-based static analysis tools for vulnerability detection need features, which analysts create manually, as inputs. In contrast, Eth2Vec automatically learns features of vulnerable Ethereum Virtual Machine (EVM) bytecodes with tacit knowledge through a neural network for natural language processing. Therefore, Eth2Vec can detect vulnerabilities in smart contracts by comparing the code similarity between target EVM bytecodes and the EVM bytecodes it already learned. We conducted experiments with existing open databases, such as Etherscan, and our results show that Eth2Vec outperforms the existing work in terms of well-known metrics, i.e., precision, recall, and F1-score. Moreover, Eth2Vec can detect vulnerabilities even in rewritten codes.

[1]  Chunhua Su,et al.  ContractWard: Automated Vulnerability Detection Models for Ethereum Smart Contracts , 2020, IEEE Transactions on Network Science and Engineering.

[2]  Sukrit Kalra,et al.  ZEUS: Analyzing Safety of Smart Contracts , 2018, NDSS.

[3]  Gianluca Stringhini,et al.  ATTACK2VEC: Leveraging Temporal Word Embeddings to Understand the Evolution of Cyberattacks , 2019, USENIX Security Symposium.

[4]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[5]  Gernot Salzer,et al.  A Survey of Tools for Analyzing Ethereum Smart Contracts , 2019, 2019 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPCON).

[6]  Paulo Shakarian,et al.  DarkEmbed: Exploit Prediction With Neural Language Models , 2018, AAAI.

[7]  Chunhua Su,et al.  An Efficient Vulnerability Detection Model for Ethereum Smart Contracts , 2019, NSS.

[8]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[9]  Radu State,et al.  Visual emulation for Ethereum's virtual machine , 2018, NOMS 2018 - 2018 IEEE/IFIP Network Operations and Management Symposium.

[10]  Yi Zhang,et al.  KEVM: A Complete Formal Semantics of the Ethereum Virtual Machine , 2018, 2018 IEEE 31st Computer Security Foundations Symposium (CSF).

[11]  Yang Zhang,et al.  walk2friends: Inferring Social Links from Mobility Profiles , 2017, CCS.

[12]  Sergei Tikhomirov,et al.  SmartCheck: Static Analysis of Ethereum Smart Contracts , 2018, 2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB).

[13]  Jia-Guang Sun,et al.  Enabling Clone Detection For Ethereum Via Smart Contract Birthmarks , 2019, 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC).

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Yang Feng,et al.  Smart Contract Development: Challenges and Opportunities , 2021, IEEE Transactions on Software Engineering.

[16]  Mislav Balunovic,et al.  Learning to Fuzz from Symbolic Execution with Application to Smart Contracts , 2019, CCS.

[17]  Benjamin C. M. Fung,et al.  Kam1n0: MapReduce-based Assembly Clone Search for Reverse Engineering , 2016, KDD.

[18]  Xiapu Luo,et al.  Under-optimized smart contracts devour your money , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[19]  Ghassan O. Karame,et al.  Sereum: Protecting Existing Smart Contracts Against Re-Entrancy Attacks , 2018, NDSS.

[20]  Prateek Saxena,et al.  Finding The Greedy, Prodigal, and Suicidal Contracts at Scale , 2018, ACSAC.

[21]  Xiaopeng Li,et al.  Neural Machine Translation Inspired Binary Code Similarity Comparison beyond Function Pairs , 2018, NDSS.

[22]  Robert Norvill,et al.  ÆGIS: Shielding Vulnerable Smart Contracts Against Attacks , 2020, AsiaCCS.

[23]  Petar Tsankov,et al.  Securify: Practical Security Analysis of Smart Contracts , 2018, CCS.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  Dawn Xiaodong Song,et al.  Recognizing Functions in Binaries with Neural Networks , 2015, USENIX Security Symposium.

[26]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[27]  Alex Groce,et al.  Slither: A Static Analysis Framework for Smart Contracts , 2019, 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB).

[28]  Giuseppe Antonio Di Luna,et al.  SAFE: Self-Attentive Function Embeddings for Binary Similarity , 2018, DIMVA.

[29]  Julian Schütte,et al.  Annotary: A Concolic Execution System for Developing Secure Smart Contracts , 2019, ESORICS.

[30]  Yu Wen,et al.  Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise , 2019, CCS.

[31]  Matteo Maffei,et al.  Foundations and Tools for the Static Analysis of Ethereum Smart Contracts , 2018, CAV.

[32]  Naoto Yanai,et al.  RA: Hunting for Re-Entrancy Attacks in Ethereum Smart Contracts via Static Analysis , 2020, 2020 IEEE International Conference on Blockchain (Blockchain).

[33]  Xuezixiang Li,et al.  Learning Program-Wide Code Representations for Binary Diffing , 2019, NDSS.

[34]  Roger Zimmermann,et al.  Towards Automated Reentrancy Detection for Smart Contracts Based on Sequential Models , 2020, IEEE Access.

[35]  Benjamin C. M. Fung,et al.  Asm2Vec: Boosting Static Representation Robustness for Binary Clone Search against Code Obfuscation and Compiler Optimization , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[36]  Matteo Maffei,et al.  A Semantic Framework for the Security Analysis of Ethereum smart contracts , 2018, POST.

[37]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[38]  Radu State,et al.  Osiris: Hunting for Integer Bugs in Ethereum Smart Contracts , 2018, ACSAC.

[39]  Nikhil Swamy,et al.  Formal Verification of Smart Contracts: Short Paper , 2016, PLAS@CCS.

[40]  Clara Schneidewind,et al.  eThor: Practical and Provably Sound Static Analysis of Ethereum Smart Contracts , 2020, CCS.

[41]  Yi Zhou,et al.  Erays: Reverse Engineering Ethereum's Opaque Smart Contracts , 2018, USENIX Security Symposium.

[42]  Massimo Bartoletti,et al.  A Survey of Attacks on Ethereum Smart Contracts (SoK) , 2017, POST.

[43]  Alex Groce,et al.  Manticore: A User-Friendly Symbolic Execution Framework for Binaries and Smart Contracts , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[44]  Chao Liu,et al.  S-gram: Towards Semantic-Aware Security Auditing for Ethereum Smart Contracts , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[45]  Vincent Gramoli,et al.  Vandal: A Scalable Security Analysis Framework for Smart Contracts , 2018, ArXiv.

[46]  Felix Hill,et al.  Learning Distributed Representations of Sentences from Unlabelled Data , 2016, NAACL.

[47]  Robert Norvill,et al.  {\AE}GIS: Shielding Vulnerable Smart Contracts Against Attacks , 2020, 2003.05987.

[48]  Chao Liu,et al.  EClone: detect semantic clones in Ethereum via symbolic transaction sketch , 2018, ESEC/SIGSOFT FSE.

[49]  Yu Wang,et al.  Machine Learning Model for Smart Contracts Security Analysis , 2019, 2019 17th International Conference on Privacy, Security and Trust (PST).

[50]  Prateek Saxena,et al.  Making Smart Contracts Smarter , 2016, IACR Cryptol. ePrint Arch..

[51]  Lingxiao Jiang,et al.  Checking Smart Contracts With Structural Code Embedding , 2020, IEEE Transactions on Software Engineering.