Correlated Crash Vulnerabilities

Modern distributed storage systems employ complex protocols to update replicated data. In this paper, we study whether such update protocols work correctly in the presence of correlated crashes. We find that the correctness of such protocols hinges on how local file-system state is updated by each replica in the system. We build PACE, a framework that systematically generates and explores persistent states that can occur in a distributed execution. PACE uses a set of generic rules to effectively prune the state space, reducing checking time from days to hours in some cases. We apply PACE to eight widely used distributed storage systems to find correlated crash vulnerabilities, i.e., problems in the update protocol that lead to user-level guarantee violations. PACE finds a total of 26 vulnerabilities across eight systems, many of which lead to severe consequences such as data loss, corrupted data, or unavailable clusters.

[1]  Mark Lillibridge,et al.  Torturing Databases for Fun and Profit , 2014, OSDI.

[2]  Fred B. Schneider,et al.  Implementing fault-tolerant services using the state machine approach: a tutorial , 1990, CSUR.

[3]  Butler W. Lampson,et al.  Crash Recovery in a Distributed Data Storage System , 1981 .

[4]  Barbara Liskov,et al.  Viewstamped Replication: A New Primary Copy Method to Support Highly-Available Distributed Systems , 1999, PODC '88.

[5]  Eric Eide,et al.  Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications , 2014, login Usenix Mag..

[6]  Junfeng Yang,et al.  Practical software model checking via dynamic interface reduction , 2011, SOSP.

[7]  Andrea C. Arpaci-Dusseau,et al.  Towards efficient, portable application-level consistency , 2013, HotDep.

[8]  Marco Canini,et al.  Checking for Insidious Faults in Deployed Federated and Heterogeneous Distributed Systems , 2011 .

[9]  Maysam Yabandeh,et al.  DPOR-DS: Dynamic Partial Order Reduction in Distributed Systems , 2009 .

[10]  Emina Torlak,et al.  Specifying and Checking File System Crash-Consistency Models , 2016, International Conference on Architectural Support for Programming Languages and Operating Systems.

[11]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[12]  Miguel Castro,et al.  Farsite: federated, available, and reliable storage for an incompletely trusted environment , 2002, OPSR.

[13]  Marvin Theimer,et al.  Managing update conflicts in Bayou, a weakly connected replicated storage system , 1995, SOSP.

[14]  A. Fleischmann Distributed Systems , 1994, Springer Berlin Heidelberg.

[15]  Andrea C. Arpaci-Dusseau,et al.  Crash Consistency , 2015, ACM Queue.

[16]  Tony Tung,et al.  Scaling Memcache at Facebook , 2013, NSDI.

[17]  Ion Stoica,et al.  Friday: Global Comprehension for Distributed Replay , 2007, NSDI.

[18]  Ozalp Babaoglu,et al.  Consistent global states of distributed systems: fundamental concepts and mechanisms , 1993 .

[19]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[20]  Srinivasan Seshan,et al.  Subtleties in Tolerating Correlated Failures in Wide-area Storage Systems , 2006, NSDI.

[21]  Andreas Haeberlen,et al.  Glacier: highly durable, decentralized storage despite massive correlated failures , 2005, NSDI.

[22]  John K. Ousterhout,et al.  In Search of an Understandable Consensus Algorithm , 2014, USENIX ATC.

[23]  Emin Gün Sirer,et al.  HyperDex: a distributed, searchable key-value store , 2012, SIGCOMM '12.

[24]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[25]  Dirk Beyer,et al.  Designing for Disasters , 2004, FAST.

[26]  Barbara Liskov,et al.  Viewstamped Replication Revisited , 2012 .

[27]  Leslie Lamport,et al.  The Byzantine Generals Problem , 1982, TOPL.

[28]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[29]  Andrea C. Arpaci-Dusseau,et al.  Crash consistency , 2015, Commun. ACM.

[30]  Pallavi Joshi,et al.  SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems , 2014, OSDI.

[31]  Sachin Katti,et al.  Copysets: Reducing the Frequency of Data Loss in Cloud Storage , 2013, USENIX Annual Technical Conference.

[32]  Andrea C. Arpaci-Dusseau,et al.  Beyond Storage APIs: Provable Semantics for Storage Stacks , 2015, HotOS.

[33]  Viktor Kuncak,et al.  CrystalBall: Predicting and Preventing Inconsistencies in Deployed Distributed Systems , 2009, NSDI.

[34]  Yasushi Saito,et al.  Optimistic replication , 2005, CSUR.

[35]  Flaviu Cristian,et al.  Atomic Broadcast: From Simple Message Diffusion to Byzantine Agreement , 1995, Inf. Comput..

[36]  Van-Anh Truong,et al.  Availability in Globally Distributed Storage Systems , 2010, OSDI.

[37]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[38]  Andrea C. Arpaci-Dusseau,et al.  Improving file system reliability with I/O shepherding , 2007, SOSP.

[39]  Rachid Guerraoui,et al.  Model Checking a Networked System Without the Network , 2011, NSDI.

[40]  Sape J. Mullender,et al.  Distributed systems (2nd Ed.) , 1993 .

[41]  Haoxiang Lin,et al.  MODIST: Transparent Model Checking of Unmodified Distributed Systems , 2009, NSDI.

[42]  Srinath T. V. Setty,et al.  IronFleet: proving practical distributed systems correct , 2015, SOSP.

[43]  Cheng Huang,et al.  Uncovering Bugs in Distributed Storage Systems during Testing (Not in Production!) , 2016, FAST.

[44]  GhemawatSanjay,et al.  The Google file system , 2003 .

[45]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[46]  Randy H. Katz,et al.  A case for redundant arrays of inexpensive disks (RAID) , 1988, SIGMOD '88.

[47]  Brett D. Fleisch,et al.  The Chubby lock service for loosely-coupled distributed systems , 2006, OSDI '06.

[48]  Gregory R. Ganger,et al.  On Correlated Failures in Survivable Storage Systems , 2002 .

[49]  Roger M. Needham,et al.  Grapevine: an exercise in distributed computing , 1982, CACM.

[50]  Kenneth P. Birman,et al.  Reliable communication in the presence of failures , 1987, TOCS.

[51]  Remzi H. Arpaci-Dusseau Operating Systems: Three Easy Pieces , 2015, login Usenix Mag..

[52]  Andrea C. Arpaci-Dusseau,et al.  All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications , 2014, OSDI.

[53]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[54]  Tim Hawkins,et al.  Introduction to MongoDB , 2013 .