Grapple: A Graph System for Static Finite-State Property Checking of Large-Scale Systems Code

Many real-world bugs in large-scale systems are related to object state that is supposed to obey a specified finite state machine (FSM). They are triggered when unexpected events occur on objects in certain states, making these objects transition in a way that violates their specifications. Detecting such FSM-related bugs with static analysis is challenging, especially in distributed systems that have large codebases. This paper presents a single-machine, disk-based graph system, called Grapple, which was designed to conduct precise and scalable checking of finite-state properties for very large codebases. Grapple detects bugs through context-sensitive, path-sensitive alias and dataflow analyses, which are both formulated as dynamic transitive-closure computations and automatically parallelized by the system. We propose a novel path constraint encoding/decoding algorithm to attach a path constraint to a graph edge, allowing the graph engine to efficiently recover a path and compute its constraint during the computation. We have implemented Grapple and conducted a comprehensive evaluation over widely deployed distributed systems. Grapple reported a total of 376 warnings, of which only 17 are false positives. Our results also demonstrate the scalability of Grapple: it took between 51 minutes and 33 hours to finish all the analyses on a low-end desktop with 16G memory and 1T SSD space, while the traditional approaches ran out of memory in all cases.

[1]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[2]  Kai Wang,et al.  Graspan: A Single-machine Disk-based Graph System for Interprocedural Static Analyses of Large-scale Systems Code , 2017, ASPLOS.

[3]  Barton P. Miller,et al.  An empirical study of the robustness of Windows NT applications using random testing , 2000 .

[4]  Kai Wang,et al.  GraphQ: Graph Query Processing with Abstraction Refinement - Scalable and Programmable Analytics over Very Large Graphs on a Single PC , 2015, USENIX Annual Technical Conference.

[5]  Junfeng Yang,et al.  An empirical study of operating systems errors , 2001, SOSP.

[6]  Xiaoyan Zhu,et al.  Does bug prediction support human developers? Findings from a Google case study , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[7]  David A. McAllester,et al.  On the cubic bottleneck in subtyping and flow analysis , 1997, Proceedings of Twelfth Annual IEEE Symposium on Logic in Computer Science.

[8]  Hao Tang,et al.  Conditional Dyck-CFL Reachability Analysis for Complete and Efficient Library Summarization , 2017, ESOP.

[9]  Rongxin Wu,et al.  Pinpoint: fast and precise sparse value flow analysis for million lines of code , 2018, PLDI.

[10]  Thomas W. Reps,et al.  Shape analysis as a generalized path problem , 1995, PEPM '95.

[11]  Dawson R. Engler,et al.  Z-Ranking: Using Statistical Analysis to Counter the Impact of Static Analysis Approximations , 2003, SAS.

[12]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[13]  Wenguang Chen,et al.  GridGraph: Large-Scale Graph Processing on a Single Machine Using 2-Level Hierarchical Partitioning , 2015, USENIX ATC.

[14]  Sriram K. Rajamani,et al.  Compositional may-must program analysis: unleashing the power of alternation , 2010, POPL '10.

[15]  Atanas Rountev,et al.  Demand-driven context-sensitive alias analysis for Java , 2011, ISSTA '11.

[16]  Xin Zheng,et al.  Demand-driven alias analysis for C , 2008, POPL '08.

[17]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[18]  Robert W. Bowdidge,et al.  Why don't software developers use static analysis tools to find bugs? , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[19]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[20]  Dawson R. Engler,et al.  Checking system rules using system-specific, programmer-written compiler extensions , 2000, OSDI.

[21]  Manu Sridharan,et al.  Refinement-based context-sensitive points-to analysis for Java , 2006, PLDI '06.

[22]  Shan Lu,et al.  TaxDC: A Taxonomy of Non-Deterministic Concurrency Bugs in Datacenter Distributed Systems , 2016, ASPLOS.

[23]  Mahadev Konar,et al.  ZooKeeper: Wait-free Coordination for Internet-scale Systems , 2010, USENIX ATC.

[24]  Xiao Ma,et al.  MUVI: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs , 2007, SOSP.

[25]  Carlos Urias Munoz,et al.  Automatic Generation of Random Self-Checking Test Cases , 1983, IBM Syst. J..

[26]  Zhendong Su,et al.  Fast algorithms for Dyck-CFL-reachability with applications to alias analysis , 2013, PLDI.

[27]  Zhendong Su,et al.  Calling-to-reference context translation via constraint-guided CFL-reachability , 2018, PLDI.

[28]  Yin Liu,et al.  Static analysis for inference of explicit information flow , 2008, PASTE '08.

[29]  Nikolaj Bjørner,et al.  Property-Directed Shape Analysis , 2014, CAV.

[30]  Jonathan Aldrich,et al.  Modular typestate checking of aliased objects , 2007, OOPSLA.

[31]  Pallavi Joshi,et al.  SAMC: Semantic-Aware Model Checking for Fast Discovery of Deep Bugs in Cloud Systems , 2014, OSDI.

[32]  Dawson R. Engler,et al.  Bugs as deviant behavior: a general approach to inferring errors in systems code , 2001, SOSP.

[33]  Junfeng Yang,et al.  Using model checking to find serious file system errors , 2004, TOCS.

[34]  Wei Lin,et al.  A characteristic study on failures of production distributed data-parallel programs , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[35]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[36]  Kai Wang,et al.  RStream: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine , 2018, OSDI.

[37]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[38]  Jingling Xue,et al.  Static memory leak detection using full-sparse value-flow analysis , 2012, ISSTA 2012.

[39]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[40]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[41]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[42]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[43]  Manu Sridharan,et al.  Demand-driven points-to analysis for Java , 2005, OOPSLA '05.

[44]  Dawson R. Engler,et al.  A system and language for building system-specific, static analyses , 2002, PLDI '02.

[45]  Willy Zwaenepoel,et al.  X-Stream: edge-centric graph processing using streaming partitions , 2013, SOSP.

[46]  Alexander Aiken,et al.  Specification Inference Using Context-Free Language Reachability , 2015, POPL.

[47]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[48]  Vikram S. Adve,et al.  Making context-sensitive points-to analysis with heap cloning practical for the real world , 2007, PLDI '07.

[49]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[50]  Yuanyuan Zhou,et al.  Have things changed now?: an empirical study of bug characteristics in modern open source software , 2006, ASID '06.

[51]  Thomas A. Henzinger,et al.  Lazy abstraction , 2002, POPL '02.

[52]  Manu Sridharan,et al.  Scaling CFL-Reachability-Based Points-To Analysis Using Context-Sensitive Must-Not-Alias Analysis , 2009, ECOOP.

[53]  Yu Luo,et al.  Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-Intensive Systems , 2014, OSDI.

[54]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[55]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[56]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[57]  Thomas W. Reps,et al.  Program analysis via graph reachability , 1997, Inf. Softw. Technol..

[58]  Alexander Aiken,et al.  Verifying the Safety of User Pointer Dereferences , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[59]  Sorin Lerner,et al.  ESP: path-sensitive program verification in polynomial time , 2002, PLDI '02.

[60]  Isil Dillig,et al.  An overview of the saturn project , 2007, PASTE '07.

[61]  Tanakorn Leesatapornwongsa,et al.  What Bugs Live in the Cloud? A Study of 3000+ Issues in Cloud Systems , 2014, SoCC.

[62]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[63]  Mihalis Yannakakis,et al.  Graph-theoretic methods in database theory , 1990, PODS.

[64]  Thomas W. Reps,et al.  Precise interprocedural dataflow analysis via graph reachability , 1995, POPL '95.

[65]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[66]  Thomas W. Reps,et al.  Speeding up slicing , 1994, SIGSOFT '94.

[67]  Daniel Kroening,et al.  Predicate Abstraction of ANSI-C Programs Using SAT , 2004, Formal Methods Syst. Des..

[68]  Dawson R. Engler,et al.  A few billion lines of code later , 2010, Commun. ACM.

[69]  Sigmund Cherem,et al.  Practical memory leak detection using guarded value-flow analysis , 2007, PLDI '07.

[70]  Eran Yahav,et al.  Effective typestate verification in the presence of aliasing , 2006, TSEM.

[71]  Isil Dillig,et al.  Sound, complete and scalable path-sensitive analysis , 2008, PLDI '08.

[72]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[73]  Christophe Calvès,et al.  Faults in linux: ten years later , 2011, ASPLOS XVI.

[74]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[75]  Alan J. Hu,et al.  Calysto: scalable and precise extended static checking , 2008, ICSE.

[76]  Edmund M. Clarke,et al.  Counterexample-guided abstraction refinement , 2003, 10th International Symposium on Temporal Representation and Reasoning, 2003 and Fourth International Conference on Temporal Logic. Proceedings..

[77]  A Pnueli,et al.  Two Approaches to Interprocedural Data Flow Analysis , 2018 .

[78]  Jakob Rehof,et al.  Type-base flow analysis: from polymorphic subtyping to CFL-reachability , 2001, POPL '01.

[79]  Robert E. Strom,et al.  Typestate: A programming language concept for enhancing software reliability , 1986, IEEE Transactions on Software Engineering.

[80]  Yuanyuan Zhou,et al.  SafeMem: exploiting ECC-memory for detecting memory leaks and memory corruption during production runs , 2005, 11th International Symposium on High-Performance Computer Architecture.

[81]  Yuanyuan Zhou,et al.  Rx: treating bugs as allergies---a safe method to survive software failures , 2005, SOSP '05.

[82]  Chen Li,et al.  AsterixDB: A Scalable, Open Source BDMS , 2014, Proc. VLDB Endow..