SquirRL: Automating Attack Analysis on Blockchain Incentive Mechanisms with Deep Reinforcement Learning

Incentive mechanisms are central to the functionality of permissionless blockchains: they incentivize participants to run and secure the underlying consensus protocol. Designing incentive-compatible incentive mechanisms is notoriously challenging, however. As a result, most public blockchains today use incentive mechanisms whose security properties are poorly understood and largely untested. In this work, we propose SquirRL, a framework for using deep reinforcement learning to analyze attacks on blockchain incentive mechanisms. We demonstrate SquirRL’s power by first recovering known attacks: (1) the optimal selfish mining attack in Bitcoin [56], and (2) the Nash equilibrium in block withholding attacks [18]. We also use SquirRL to obtain several novel empirical results. First, we discover a counterintuitive flaw in the widely used rushing adversary model when applied to multi-agent Markov games with incomplete information. Second, we demonstrate that the optimal selfish mining strategy identified in [56] is actually not a Nash equilibrium in the multi-agent selfish mining setting. In fact, our results suggest (but do not prove) that when more than two competing agents engage in selfish mining, there is no profitable Nash equilibrium. This is consistent with the lack of observed selfish mining in the wild. Third, we find a novel attack on a simplified version of Ethereum’s finalization mechanism, Casper the Friendly Finality Gadget (FFG) that allows a strategic agent to amplify her rewards by up to 30%. Notably, [12] shows that honest voting is a Nash equilibrium in Casper FFG; our attack shows that when Casper FFG is composed with selfish mining, this is no longer the case. Altogether, our experiments demonstrate SquirRL’s flexibility and promise as a framework for studying attack settings that have thus far eluded theoretical and empirical understanding. Keywords—Blockchain, Deep reinforcement learning, Incentive mechanisms ∗Equal contribution

[1]  T. Maugh Why buy when you can rent? , 1984, Science.

[2]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[4]  Vijay R. Konda,et al.  Actor-Critic Algorithms , 1999, NIPS.

[5]  Manuela Veloso,et al.  Scalable Learning in Stochastic Games , 2002 .

[6]  Robert Tappan Morris,et al.  Security Considerations for Peer-to-Peer Distributed Hash Tables , 2002, IPTPS.

[7]  Miguel Castro,et al.  Secure routing for structured peer-to-peer overlay networks , 2002, OSDI '02.

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[10]  Atul Singh,et al.  Eclipse Attacks on Overlay Networks: Threats and Defenses , 2006, Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications.

[11]  Dipankar Dasgupta,et al.  Game theory for cyber security , 2010, CSIIRW '10.

[12]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[13]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[14]  Meni Rosenfeld,et al.  Analysis of Bitcoin Pooled Mining Reward Systems , 2011, ArXiv.

[15]  Christian Decker,et al.  Information propagation in the Bitcoin network , 2013, IEEE P2P 2013 Proceedings.

[16]  Lear Bahack,et al.  Theoretical Bitcoin Attacks with less than Half of the Computational Power (draft) , 2013, IACR Cryptol. ePrint Arch..

[17]  Tyler Moore,et al.  Game-Theoretic Analysis of DDoS Attacks Against Bitcoin Mining Pools , 2014, Financial Cryptography Workshops.

[18]  Tyler Moore,et al.  Empirical Analysis of Denial-of-Service Attacks in the Bitcoin Ecosystem , 2014, Financial Cryptography Workshops.

[19]  Emin Gün Sirer,et al.  Majority Is Not Enough: Bitcoin Mining Is Vulnerable , 2013, Financial Cryptography.


[21]  Peng Ning,et al.  Improving learning and adaptation in security games by exploiting information asymmetry , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[22]  Aron Laszka,et al.  When Bitcoin Mining Pools Run Dry - A Game-Theoretic Analysis of the Long-Term Impact of Attacks Between Mining Pools , 2015, Financial Cryptography Workshops.

[23]  Ethan Heilman,et al.  Eclipse Attacks on Bitcoin's Peer-to-Peer Network , 2015, USENIX Security Symposium.

[24]  Prateek Saxena,et al.  On Power Splitting Games in Distributed Computation: The Case of Bitcoin Pooled Mining , 2015, 2015 IEEE 28th Computer Security Foundations Symposium.

[25]  Ittay Eyal,et al.  The Miner's Dilemma , 2014, 2015 IEEE Symposium on Security and Privacy.

[26]  Aviv Zohar,et al.  Secure High-Rate Transaction Processing in Bitcoin , 2015, Financial Cryptography.

[27]  Aggelos Kiayias,et al.  The Bitcoin Backbone Protocol: Analysis and Applications , 2015, EUROCRYPT.

[28]  Elaine Shi,et al.  FruitChains: A Fair Blockchain , 2017, IACR Cryptol. ePrint Arch..

[29]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[30]  Aviv Zohar,et al.  Optimal Selfish Mining Strategies in Bitcoin , 2015, Financial Cryptography.

[31]  Kartik Nayak,et al.  Stubborn Mining: Generalizing Selfish Mining and Combining with an Eclipse Attack , 2016, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[32]  Sanjay Jain,et al.  When Cryptocurrencies Mine Their Own Business , 2016, Financial Cryptography.

[33]  Karl Tuyls,et al.  Markov Security Games : Learning in Spatial Security Problems , 2016 .

[34]  S. Matthew Weinberg,et al.  On the Instability of Bitcoin Without the Block Reward , 2016, CCS.

[35]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.


[37]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[38]  Elman Mansimov,et al.  Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation , 2017, NIPS.

[39]  Demis Hassabis,et al.  Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm , 2017, ArXiv.

[40]  Igor Kabashkin,et al.  Risk Modelling of Blockchain Ecosystem , 2017, NSS.

[41]  Jonathan Katz,et al.  Incentivizing Blockchain Forks via Whale Transactions , 2017, Financial Cryptography Workshops.

[42]  Yongdae Kim,et al.  Be Selfish and Avoid Dilemmas: Fork After Withholding (FAW) Attacks on Bitcoin , 2017, CCS.

[43]  Marc Jansen,et al.  Short Paper: Revisiting Difficulty Control for Blockchain Systems , 2017, DPM/CBT@ESORICS.

[44]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[45]  Vitalik Buterin,et al.  Casper the Friendly Finality Gadget , 2017, ArXiv.

[46]  Ethan Heilman,et al.  Low-Resource Eclipse Attacks on Ethereum's Peer-to-Peer Network , 2020, IACR Cryptol. ePrint Arch..

[47]  Sarah Meiklejohn,et al.  Smart contracts for bribing miners , 2018, IACR Cryptol. ePrint Arch..

[48]  Peter Henderson,et al.  An Introduction to Deep Reinforcement Learning , 2018, Found. Trends Mach. Learn..

[49]  Alf Zugenmaier,et al.  The Impact of Uncle Rewards on Selfish Mining in Ethereum , 2018, 2018 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW).

[50]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[51]  Chao Zhang,et al.  Fuzzing: a survey , 2018, Cybersecur..

[52]  Lantao Yu,et al.  Deep Reinforcement Learning for Green Security Game with Online Information , 2018, AAAI Workshops.

[53]  Wei Xu,et al.  Scaling Nakamoto Consensus to Thousands of Transactions per Second , 2018, ArXiv.

[54]  Edgar R. Weippl,et al.  Pay-To-Win: Incentive Attacks on Proof-of-Work Cryptocurrencies , 2019, IACR Cryptol. ePrint Arch..

[55]  Chen Feng,et al.  Selfish Mining in Ethereum , 2019, 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS).

[56]  Ari Juels,et al.  Flash Boys 2.0: Frontrunning, Transaction Reordering, and Consensus Instability in Decentralized Exchanges , 2019, ArXiv.

[57]  Mehdi Shajari,et al.  Block withholding game among bitcoin mining pools , 2019, Future Gener. Comput. Syst..

[58]  T. Başar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[59]  Vitalik Buterin,et al.  Incentives in Ethereum’s Hybrid Casper Protocol , 2019, 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC).

[60]  Lantao Yu,et al.  Deep Reinforcement Learning for Green Security Games with Real-Time Information , 2018, AAAI.

[61]  Jonathan Katz,et al.  Competing (Semi-)Selfish Miners in Bitcoin , 2019, AFT.

[62]  Xing Wang,et al.  A Deep Dive Into Blockchain Selfish Mining , 2018, ICC 2019 - 2019 IEEE International Conference on Communications (ICC).

[63]  Serge Fehr,et al.  Towards optimal robust secret sharing with security against a rushing adversary , 2019, IACR Cryptol. ePrint Arch..

[64]  Jeremy Clark,et al.  SoK: Transparent Dishonesty: Front-Running Attacks on Blockchain , 2019, Financial Cryptography Workshops.

[65]  Jakub W. Pachocki,et al.  Dota 2 with Large Scale Deep Reinforcement Learning , 2019, ArXiv.

[66]  Pasin Manurangsi,et al.  Nearly Optimal Robust Secret Sharing against Rushing Adversaries , 2020, IACR Cryptol. ePrint Arch..

[67]  Alexander Spiegelman,et al.  Mind the Mining , 2019, EC.

[68]  Igor Mordatch,et al.  Emergent Tool Use From Multi-Agent Autocurricula , 2019, ICLR.

[69]  Daniel J. Moroz,et al.  Selfish Behavior in the Tezos Proof-of-Stake Protocol , 2019, Cryptoeconomic Systems.

[70]  Sailik Sengupta,et al.  Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense , 2020, ArXiv.

[71]  Vijay Janapa Reddi,et al.  Deep Reinforcement Learning for Cyber Security , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[72]  Sham M. Kakade,et al.  On the Theory of Policy Gradient Methods: Optimality, Approximation, and Distribution Shift , 2019, J. Mach. Learn. Res..