SquirRL: Automating Attack Discovery on Blockchain Incentive Mechanisms with Deep Reinforcement Learning

Incentive mechanisms are central to the functionality of permissionless blockchains: they incentivize participants to run and secure the underlying consensus protocol. Designing incentive-compatible incentive mechanisms is notoriously challenging, however. Even systems with strong theoretical security guarantees in traditional settings, where users are either Byzantine or honest, often exclude analysis of rational users, who may exploit incentives to deviate from honest behavior. As a result, most public blockchains today use incentive mechanisms whose security properties are poorly understood and largely untested. In this work, we propose SquirRL, a framework for using deep reinforcement learning to identify attack strategies on blockchain incentive mechanisms. With minimal setup, SquirRL replicates known theoretical results on the Bitcoin protocol. In more complex and realistic settings, as when mining power varies over time, it identifies attack strategies superior to those known in the literature. Finally, SquirRL yields results suggesting that classical selfish mining attacks against Bitcoin lose effectiveness in the presence of multiple attackers. These results shed light on why selfish mining, which is unobserved to date in the wild, may be a poor attack strategy.

[1]  Ethan Heilman,et al.  Low-Resource Eclipse Attacks on Ethereum's Peer-to-Peer Network , 2020, IACR Cryptol. ePrint Arch..

[2]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[3]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[4]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[5]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[6]  Ari Juels,et al.  Flash Boys 2.0: Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges , 2019, ArXiv.

[7]  Xing Wang,et al.  A Deep Dive Into Blockchain Selfish Mining , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[8]  Aviv Zohar,et al.  Secure High-Rate Transaction Processing in Bitcoin , 2015, Financial Cryptography.

[9]  S. Matthew Weinberg,et al.  On the Instability of Bitcoin Without the Block Reward , 2016, CCS.

[10]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[11]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[12]  T. Maugh Why buy when you can rent? , 1984, Science.

[13]  Aggelos Kiayias,et al.  The Bitcoin Backbone Protocol: Analysis and Applications , 2015, EUROCRYPT.

[14]  Ethan Heilman,et al.  Eclipse Attacks on Bitcoin's Peer-to-Peer Network , 2015, USENIX Security Symposium.

[15]  Vitalik Buterin,et al.  Incentives in Ethereum’s Hybrid Casper Protocol , 2019, 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).

[16]  Wei Xu,et al.  Scaling Nakamoto Consensus to Thousands of Transactions per Second , 2018, ArXiv.

[17]  Alexander Spiegelman,et al.  Mind the Mining , 2019, EC.

[18]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[19]  Peng Ning,et al.  Improving learning and adaptation in security games by exploiting information asymmetry , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[20]  Lantao Yu,et al.  Deep Reinforcement Learning for Green Security Games with Real-Time Information , 2018, AAAI.

[21]  Christian Decker,et al.  Information propagation in the Bitcoin network , 2013, IEEE P2P 2013 Proceedings.

[22]  Dipankar Dasgupta,et al.  Game theory for cyber security , 2010, CSIIRW '10.

[23]  Alf Zugenmaier,et al.  The Impact of Uncle Rewards on Selfish Mining in Ethereum , 2018, 2018 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW).

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[26]  Jonathan Katz,et al.  Competing (Semi-)Selfish Miners in Bitcoin , 2019, AFT.

[27]  Aron Laszka,et al.  When Bitcoin Mining Pools Run Dry - A Game-Theoretic Analysis of the Long-Term Impact of Attacks Between Mining Pools , 2015, Financial Cryptography Workshops.

[28]  Lear Bahack,et al.  Theoretical Bitcoin Attacks with less than Half of the Computational Power (draft) , 2013, IACR Cryptol. ePrint Arch..

[29]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[30]  Meni Rosenfeld,et al.  Analysis of Bitcoin Pooled Mining Reward Systems , 2011, ArXiv.

[31]  Ittay Eyal,et al.  The Miner's Dilemma , 2014, 2015 IEEE Symposium on Security and Privacy.

[32]  Mehdi Shajari,et al.  Block withholding game among bitcoin mining pools , 2019, Future Gener. Comput. Syst..

[33]  Sailik Sengupta,et al.  Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense , 2020, ArXiv.

[34]  Jeremy Clark,et al.  SoK: Transparent Dishonesty: Front-Running Attacks on Blockchain , 2019, Financial Cryptography Workshops.

[35]  Jonathan Katz,et al.  Incentivizing Blockchain Forks via Whale Transactions , 2017, Financial Cryptography Workshops.

[36]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[37]  Kartik Nayak,et al.  Stubborn Mining: Generalizing Selfish Mining and Combining with an Eclipse Attack , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[38]  Tyler Moore,et al.  Empirical Analysis of Denial-of-Service Attacks in the Bitcoin Ecosystem , 2014, Financial Cryptography Workshops.

[39]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[40]  Prateek Saxena,et al.  On Power Splitting Games in Distributed Computation: The Case of Bitcoin Pooled Mining , 2015, 2015 IEEE 28th Computer Security Foundations Symposium.

[41]  Chen Feng,et al.  Selfish Mining in Ethereum , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[42]  Atul Singh,et al.  Eclipse Attacks on Overlay Networks: Threats and Defenses , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[43]  Sanjay Jain,et al.  When Cryptocurrencies Mine Their Own Business , 2016, Financial Cryptography.

[44]  Cyril Grunspan,et al.  Selfish Mining in Ethereum , 2019, ArXiv.

[45]  Aviv Zohar,et al.  Optimal Selfish Mining Strategies in Bitcoin , 2015, Financial Cryptography.

[46]  Robert Tappan Morris,et al.  Security Considerations for Peer-to-Peer Distributed Hash Tables , 2002, IPTPS.

[47]  Marc Jansen,et al.  Short Paper: Revisiting Difficulty Control for Blockchain Systems , 2017, DPM/CBT@ESORICS.

[48]  Elaine Shi,et al.  FruitChains: A Fair Blockchain , 2017, IACR Cryptol. ePrint Arch..

[49]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[50]  Vijay Janapa Reddi,et al.  Deep Reinforcement Learning for Cyber Security , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[51]  Miguel Castro,et al.  Secure routing for structured peer-to-peer overlay networks , 2002, OSDI '02.

[52]  Serge Fehr,et al.  Towards optimal robust secret sharing with security against a rushing adversary , 2019, IACR Cryptol. ePrint Arch..

[53]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[54]  DR. Gavin Wood POLKADOT: VISION FOR A HETEROGENEOUS MULTI-CHAIN FRAMEWORK , 2016 .

[55]  Edgar R. Weippl,et al.  Pay-To-Win: Incentive Attacks on Proof-of-Work Cryptocurrencies , 2019, IACR Cryptol. ePrint Arch..

[56]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[57]  Sham M. Kakade,et al.  On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..

[58]  Sarah Meiklejohn,et al.  Smart contracts for bribing miners , 2018, IACR Cryptol. ePrint Arch..

[59]  David C. Parkes,et al.  Selfish Behavior in the Tezos Proof-of-Stake Protocol , 2019, ArXiv.

[60]  Pasin Manurangsi,et al.  Nearly Optimal Robust Secret Sharing against Rushing Adversaries , 2020, IACR Cryptol. ePrint Arch..

[61]  Vitalik Buterin,et al.  Casper the Friendly Finality Gadget , 2017, ArXiv.

[62]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[63]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[64]  Tyler Moore,et al.  Game-Theoretic Analysis of DDoS Attacks Against Bitcoin Mining Pools , 2014, Financial Cryptography Workshops.

[65]  Igor Kabashkin,et al.  Risk Modelling of Blockchain Ecosystem , 2017, NSS.

[66]  Emin Gün Sirer,et al.  Majority Is Not Enough: Bitcoin Mining Is Vulnerable , 2013, Financial Cryptography.

[67]  Manuela Veloso,et al.  Scalable Learning in Stochastic Games , 2002 .

[68]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[69]  Silvio Micali,et al.  Algorand: Scaling Byzantine Agreements for Cryptocurrencies , 2017, IACR Cryptol. ePrint Arch..

[70]  Hubert Ritzdorf,et al.  On the Security and Performance of Proof of Work Blockchains , 2016, IACR Cryptol. ePrint Arch..