A Data Science Approach for Honeypot Detection in Ethereum

Ethereum smart contracts have recently drawn a considerable amount of attention from the media, the financial industry and academia. With the increase in popularity, malicious users found new opportunities to profit by deceiving newcomers. Consequently, attackers started luring other attackers into contracts that seem to have exploitable flaws, but that actually contain a complex hidden trap that in the end benefits the contract creator. In the blockchain community, these contracts are known as honeypots. A recent study presented a tool called HONEYBADGER that uses symbolic execution to detect honeypots by analyzing contract bytecode. In this paper, we present a data science detection approach based foremost on the contract transaction behavior. We create a partition of all the possible cases of fund movements between the contract creator, the contract, the transaction sender and other participants. To this end, we add transaction aggregated features, such as the number of transactions and the corresponding mean value and other contract features, for example compilation information and source code length. We find that all aforementioned categories of features contain useful information for the detection of honeypots. Moreover, our approach allows us to detect new, previously undetected honeypots of already known techniques. We furthermore employ our method to test the detection of unknown honeypot techniques by sequentially removing one technique from the training set. We show that our method is capable of discovering the removed honeypot techniques. Finally, we discovered two new techniques that were previously not known.

[1]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[2]  Radu State,et al.  Automated Labeling of Unknown Contracts in Ethereum , 2017, 2017 26th International Conference on Computer Communication and Networks (ICCCN).

[3]  Radu State,et al.  Finding Suspicious Activities in Financial Transactions and Distributed Ledgers , 2017, 2017 IEEE International Conference on Data Mining Workshops (ICDMW).

[4]  Christian Rossow,et al.  teEther: Gnawing at Ethereum to Automatically Exploit Smart Contracts , 2018, USENIX Security Symposium.

[5]  Petar Tsankov,et al.  Securify: Practical Security Analysis of Smart Contracts , 2018, CCS.

[6]  Mathis Steichen,et al.  The Art of The Scam: Demystifying Honeypots in Ethereum Smart Contracts , 2019, USENIX Security Symposium.

[7]  Massimo Bartoletti,et al.  A Survey of Attacks on Ethereum Smart Contracts (SoK) , 2017, POST.

[8]  Tom M. Mitchell,et al.  Machine Learning and Data Mining , 2012 .

[9]  Prateek Saxena,et al.  Making Smart Contracts Smarter , 2016, IACR Cryptol. ePrint Arch..

[10]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[11]  Simon Caton,et al.  Predicting the Price of Bitcoin Using Machine Learning , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).

[12]  Massimo Bartoletti,et al.  Dissecting Ponzi schemes on Ethereum: identification, analysis, and impact , 2017, Future Gener. Comput. Syst..

[13]  Sourav Sengupta,et al.  Towards Safer Smart Contracts: A Sequence Learning Approach to Detecting Vulnerabilities , 2018, ArXiv.

[14]  Satoshi Nakamoto Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .

[15]  Zibin Zheng,et al.  Detecting Ponzi Schemes on Ethereum: Towards Healthier Blockchain Technology , 2018, WWW.

[16]  Radu State,et al.  Osiris: Hunting for Integer Bugs in Ethereum Smart Contracts , 2018, ACSAC.

[17]  Prateek Saxena,et al.  Finding The Greedy, Prodigal, and Suicidal Contracts at Scale , 2018, ACSAC.

[18]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[19]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.