When Deep Learning Meets Smart Contracts

Ethereum has become a widely used platform to enable secure, Blockchain-based financial and business transactions. However, many identified bugs and vulnerabilities in smart contracts have led to serious financial losses, which raises serious concerns about smart contract security. Thus, there is a significant need to better maintain smart contract code and ensure its high reliability. In this research: (1) Firstly, we propose an automated deep learning based approach to learn structural code embeddings of smart contracts in Solidity, which is useful for clone detection, bug detection and contract validation on smart contracts. We apply our approach to more than 22K solidity contracts collected from the Ethereum blockchain, results show that the clone ratio of solidity code is at around 90%, much higher than traditional software. We collect a list of 52 known buggy smart contracts belonging to 10 kinds of common vulnerabilities as our bug database. Our approach can identify more than 1000 clone related bugs based on our bug databases efficiently and accurately. (2) Secondly, according to developers' feedback, we have implemented the approach in a web-based tool, named Smartembed, to facilitate Solidity developers for using our approach. Our tool can assist Solidity developers to efficiently identify repetitive smart contracts in the existing Ethereum blockchain, as well as checking their contract against a known set of bugs. which can help to improve the users' confidence in the reliability of the contract. We optimize the implementations of Smartembed which is sufficient in supporting developers in real-time for practical uses. The Ethereum ecosystem as well as the individual Solidity developer can both benefit from our research. Smartembed website: http://www.smartembed.tools Demo video: https://youtu.be/o9ylyOpYFq8 Replication package: https://github.com/beyondacm/SmartEmbed

[1]  Xinli Yang,et al.  Deep Learning for Just-in-Time Defect Prediction , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[2]  Prateek Saxena,et al.  Making Smart Contracts Smarter , 2016, IACR Cryptol. ePrint Arch..

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Barbara G. Ryder,et al.  CCLearner: A Deep Learning-Based Clone Detection Approach , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[5]  Peng Jiang,et al.  A Survey on the Security of Blockchain Systems , 2017, Future Gener. Comput. Syst..

[6]  Lei Wu,et al.  Characterizing Code Clones in the Ethereum Smart Contract Ecosystem , 2019, Financial Cryptography.

[7]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[8]  Hailong Sun,et al.  A Novel Neural Source Code Representation Based on Abstract Syntax Tree , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[9]  Massimo Bartoletti,et al.  A Survey of Attacks on Ethereum Smart Contracts (SoK) , 2017, POST.

[10]  Massimo Bartoletti,et al.  Financial Cryptography and Data Security , 2017, Lecture Notes in Computer Science.

[11]  Nikhil Swamy,et al.  Formal Verification of Smart Contracts: Short Paper , 2016, PLAS@CCS.

[12]  Elaine Shi,et al.  Step by Step Towards Creating a Safe Smart Contract: Lessons and Insights from a Cryptocurrency Lab , 2016, Financial Cryptography Workshops.

[13]  Lingxiao Jiang,et al.  Checking Smart Contracts With Structural Code Embedding , 2020, IEEE Transactions on Software Engineering.

[14]  Petar Tsankov,et al.  Securify: Practical Security Analysis of Smart Contracts , 2018, CCS.

[15]  Martin White,et al.  Deep learning code fragments for code clone detection , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Mirella Lapata,et al.  Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999, ACL 1999.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Sergei Tikhomirov,et al.  SmartCheck: Static Analysis of Ethereum Smart Contracts , 2018, 2018 IEEE/ACM 1st International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB).

[19]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[20]  David Lo,et al.  Bug Characteristics in Blockchain Systems: A Large-Scale Empirical Study , 2017, 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR).

[21]  Xiapu Luo,et al.  Under-optimized smart contracts devour your money , 2017, 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[22]  Jian Li,et al.  Software Defect Prediction via Convolutional Neural Network , 2017, 2017 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[23]  Thomas Heinz Meitinger,et al.  Smart Contracts , 2017, Informatik-Spektrum.

[24]  Xiao Ma,et al.  From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[25]  David Lo,et al.  SmartEmbed: A Tool for Clone and Bug Detection in Smart Contracts through Structural Code Embedding , 2019, 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[26]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[27]  Chao Liu,et al.  EClone: detect semantic clones in Ethereum via symbolic transaction sketch , 2018, ESEC/SIGSOFT FSE.