SquirRL: Automating Attack Discovery on Blockchain Incentive Mechanisms with Deep Reinforcement Learning

Incentive mechanisms are central to the functionality of permissionless blockchains: they incentivize participants to run and secure the underlying consensus protocol. Designing incentive-compatible incentive mechanisms is notoriously challenging, however. Even systems with strong theoretical security guarantees in traditional settings, where users are either Byzantine or honest, often exclude analysis of rational users, who may exploit incentives to deviate from honest behavior. As a result, most public blockchains today use incentive mechanisms whose security properties are poorly understood and largely untested. In this work, we propose SquirRL, a framework for using deep reinforcement learning to identify attack strategies on blockchain incentive mechanisms. With minimal setup, SquirRL replicates known theoretical results on the Bitcoin protocol. In more complex and realistic settings, as when mining power varies over time, it identifies attack strategies superior to those known in the literature. Finally, SquirRL yields results suggesting that classical selfish mining attacks against Bitcoin lose effectiveness in the presence of multiple attackers. These results shed light on why selfish mining, which is unobserved to date in the wild, may be a poor attack strategy.

[1]  Vijay Janapa Reddi,et al.  Deep Reinforcement Learning for Cyber Security , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2]  S. Matthew Weinberg,et al.  On the Instability of Bitcoin Without the Block Reward , 2016, CCS.

[3]  Ethan Heilman,et al.  Eclipse Attacks on Bitcoin's Peer-to-Peer Network , 2015, USENIX Security Symposium.

[4]  Marc Jansen,et al.  Short Paper: Revisiting Difficulty Control for Blockchain Systems , 2017, DPM/CBT@ESORICS.

[5]  Emin Gün Sirer,et al.  Majority Is Not Enough: Bitcoin Mining Is Vulnerable , 2013, Financial Cryptography.

[6]  Dipankar Dasgupta,et al.  Game theory for cyber security , 2010, CSIIRW '10.

[7]  David C. Parkes,et al.  Selfish Behavior in the Tezos Proof-of-Stake Protocol , 2019, ArXiv.

[8]  Atul Singh,et al.  Eclipse Attacks on Overlay Networks: Threats and Defenses , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[9]  Ion Stoica,et al.  Ray RLLib: A Composable and Scalable Reinforcement Learning Library , 2017, NIPS 2017.

[10]  Xing Wang,et al.  A Deep Dive Into Blockchain Selfish Mining , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[11]  Sham M. Kakade,et al.  On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..

[12]  Aviv Zohar,et al.  Optimal Selfish Mining Strategies in Bitcoin , 2015, Financial Cryptography.

[13]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[14]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[15]  Ethan Heilman,et al.  Low-Resource Eclipse Attacks on Ethereum's Peer-to-Peer Network , 2020, IACR Cryptol. ePrint Arch..

[16]  Sanjay Jain,et al.  When Cryptocurrencies Mine Their Own Business , 2016, Financial Cryptography.

[17]  Stephen P. Brooks,et al.  Markov Decision Processes. , 1995 .

[18]  Kartik Nayak,et al.  Stubborn Mining: Generalizing Selfish Mining and Combining with an Eclipse Attack , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[19]  Pasin Manurangsi,et al.  Nearly Optimal Robust Secret Sharing against Rushing Adversaries , 2020, IACR Cryptol. ePrint Arch..

[20]  Aron Laszka,et al.  When Bitcoin Mining Pools Run Dry - A Game-Theoretic Analysis of the Long-Term Impact of Attacks Between Mining Pools , 2015, Financial Cryptography Workshops.

[21]  Igor Kabashkin,et al.  Risk Modelling of Blockchain Ecosystem , 2017, NSS.

[22]  Wei Xu,et al.  Scaling Nakamoto Consensus to Thousands of Transactions per Second , 2018, ArXiv.

[23]  Vitalik Buterin,et al.  Casper the Friendly Finality Gadget , 2017, ArXiv.

[24]  Jeremy Clark,et al.  SoK: Transparent Dishonesty: Front-Running Attacks on Blockchain , 2019, Financial Cryptography Workshops.

[25]  Jonathan Katz,et al.  Competing (Semi-)Selfish Miners in Bitcoin , 2019, AFT.

[26]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[27]  Lear Bahack,et al.  Theoretical Bitcoin Attacks with less than Half of the Computational Power (draft) , 2013, IACR Cryptol. ePrint Arch..

[28]  Alexander Spiegelman,et al.  Mind the Mining , 2019, EC.

[29]  Mehdi Shajari,et al.  Block withholding game among bitcoin mining pools , 2019, Future Gener. Comput. Syst..

[30]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[31]  Aviv Zohar,et al.  Secure High-Rate Transaction Processing in Bitcoin , 2015, Financial Cryptography.

[32]  Meni Rosenfeld,et al.  Analysis of Bitcoin Pooled Mining Reward Systems , 2011, ArXiv.

[33]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[34]  Jonathan Katz,et al.  Incentivizing Blockchain Forks via Whale Transactions , 2017, Financial Cryptography Workshops.

[35]  Peng Ning,et al.  Improving learning and adaptation in security games by exploiting information asymmetry , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[36]  Serge Fehr,et al.  Towards optimal robust secret sharing with security against a rushing adversary , 2019, IACR Cryptol. ePrint Arch..

[37]  Manuela Veloso,et al.  Scalable Learning in Stochastic Games , 2002 .

[38]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[39]  Robert Tappan Morris,et al.  Security Considerations for Peer-to-Peer Distributed Hash Tables , 2002, IPTPS.

[40]  Vitalik Buterin,et al.  Incentives in Ethereum’s Hybrid Casper Protocol , 2019, 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).

[41]  T. Maugh Why buy when you can rent? , 1984, Science.

[42]  Sailik Sengupta,et al.  Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense , 2020, ArXiv.

[43]  Cyril Grunspan,et al.  Selfish Mining in Ethereum , 2019, ArXiv.

[44]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[45]  Alf Zugenmaier,et al.  The Impact of Uncle Rewards on Selfish Mining in Ethereum , 2018, 2018 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW).

[46]  Hubert Ritzdorf,et al.  On the Security and Performance of Proof of Work Blockchains , 2016, IACR Cryptol. ePrint Arch..

[47]  Lantao Yu,et al.  Deep Reinforcement Learning for Green Security Games with Real-Time Information , 2018, AAAI.

[48]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[49]  Elaine Shi,et al.  FruitChains: A Fair Blockchain , 2017, IACR Cryptol. ePrint Arch..

[50]  Tyler Moore,et al.  Empirical Analysis of Denial-of-Service Attacks in the Bitcoin Ecosystem , 2014, Financial Cryptography Workshops.

[51]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[52]  Aggelos Kiayias,et al.  The Bitcoin Backbone Protocol: Analysis and Applications , 2015, EUROCRYPT.

[53]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[54]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[55]  Ittay Eyal,et al.  The Miner's Dilemma , 2014, 2015 IEEE Symposium on Security and Privacy.

[56]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[57]  Prateek Saxena,et al.  On Power Splitting Games in Distributed Computation: The Case of Bitcoin Pooled Mining , 2015, 2015 IEEE 28th Computer Security Foundations Symposium.

[58]  Miguel Castro,et al.  Secure routing for structured peer-to-peer overlay networks , 2002, OSDI '02.

[59]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[60]  Sarah Meiklejohn,et al.  Smart contracts for bribing miners , 2018, IACR Cryptol. ePrint Arch..

[61]  Ari Juels,et al.  Flash Boys 2.0: Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges , 2019, ArXiv.

[62]  Christian Decker,et al.  Information propagation in the Bitcoin network , 2013, IEEE P2P 2013 Proceedings.

[63]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[64]  Edgar R. Weippl,et al.  Pay-To-Win: Incentive Attacks on Proof-of-Work Cryptocurrencies , 2019, IACR Cryptol. ePrint Arch..

[65]  Silvio Micali,et al.  Algorand: Scaling Byzantine Agreements for Cryptocurrencies , 2017, IACR Cryptol. ePrint Arch..

[66]  Daniel Davis Wood,et al.  ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .

[67]  Tyler Moore,et al.  Game-Theoretic Analysis of DDoS Attacks Against Bitcoin Mining Pools , 2014, Financial Cryptography Workshops.