Improving Scalability of Symbolic Execution for Software with Complex Environment Interfaces

Manual software testing is laborious and prone to human error. Yet, among practitioners, it is the most popular method for quality assurance. Automating the test case generation promises better effectiveness, especially for exposing corner-case bugs. Symbolic execution stands out as an automated testing technique that has no false positives, it eventually enumerates all feasible program executions, and can prioritize executions of interest. However, path explosion—the fact that the number of program executions is typically at least exponential in the size of the program—hinders the applicability of symbolic execution in the real world, where software commonly reaches millions of lines of code. In practice, large systems can be efficiently executed symbolically by exploiting their modularity and thus symbolically execute the different parts of the system separately. However, a component typically depends on its environment to perform its task. Thus, a symbolic execution engine needs to provide an environment interface that is efficient, while maintaining accuracy and completeness. This conundrum is known as the environment problem. Systematically addressing the environment problem is challenging, as its instantiation depends on the nature of the environment and its interface. This thesis addresses two instances of the environment problem in symbolic execution, which are at opposite ends of the spectrum of interface stability: (1) system software interacting with an operating system with stable and well-documented semantics (e.g., POSIX), and (2) high-level programs written in dynamic languages, such as Python, Ruby, or JavaScript, whose semantics and interfaces are continuously evolving. To address the environment problem for stable operating system interfaces, this thesis introduces the idea of splitting an operating system model into a core set of primitives built into the engine at host level and, on top of it, the full operating system interface emulated inside the guest. As few as two primitives are sufficient to support a complex interface such as POSIX: threads with synchronization and address spaces with shared memory. We prototyped this idea in the Cloud9 symbolic execution platform. Cloud9's accurate and efficient POSIX model exposes hard-to-reproduce bugs in systems such as UNIX utilities, web servers, and distributed systems. Cloud9 is available at http://cloud9.epfl.ch. For programs written in high-level interpreted languages, this thesis introduces the idea of using the language interpreter as an "executable language specification". The interpreter runs inside a low-level (e.g., x86) symbolic execution engine while it executes the target program. The aggregate system acts as a high-level symbolic execution engine for the program. To manage the complexity of symbolically executing the entire interpreter, this thesis introduces Class-Uniform Path Analysis (CUPA), an algorithm for prioritizing paths that groups paths into equivalence classes according to a coverage goal. We built a prototype of these ideas in the form of Chef, a symbolic execution platform for interpreted languages that generates up to 1000 times more tests in popular Python and Lua packages compared to a plain execution of the interpreters. Chef is available at http://dslab.epfl.ch/proj/chef/.

[1]  Steve Hanna,et al.  A Symbolic Execution Framework for JavaScript , 2010, 2010 IEEE Symposium on Security and Privacy.

[2]  Guodong Li,et al.  KLOVER: A Symbolic Execution and Automatic Test Generation Tool for C++ Programs , 2011, CAV.

[3]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[4]  Ricardo Bianchini,et al.  Striking a new balance between program instrumentation and debugging time , 2011, EuroSys '11.

[5]  Koushik Sen,et al.  Heuristics for Scalable Dynamic Test Generation , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[6]  David L. Dill,et al.  A Decision Procedure for Bit-Vectors and Arrays , 2007, CAV.

[7]  George Candea,et al.  Execution synthesis: a technique for automated software debugging , 2010, EuroSys '10.

[8]  Thomas Ball,et al.  Finding and Reproducing Heisenbugs in Concurrent Programs , 2008, OSDI.

[9]  Xiangyu Zhang,et al.  Z3-str: a z3-based string solver for web application analysis , 2013, ESEC/FSE 2013.

[10]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[11]  Patrick Cousot,et al.  A static analyzer for large safety-critical software , 2003, PLDI.

[12]  Michael R. Lowry,et al.  Combining unit-level symbolic execution and system-level concrete execution for testing nasa software , 2008, ISSTA '08.

[13]  Sarfraz Khurshid,et al.  Korat: automated testing based on Java predicates , 2002, ISSTA '02.

[14]  Nikolai Tillmann,et al.  Pex-White Box Test Generation for .NET , 2008, TAP.

[15]  Sarfraz Khurshid,et al.  Generalized Symbolic Execution for Model Checking and Testing , 2003, TACAS.

[16]  Sagar Chaki,et al.  Finding Errors in Python Programs Using Dynamic Symbolic Execution , 2013, ICTSS.

[17]  Manfred Broy,et al.  Challenges in automotive software engineering , 2006, ICSE.

[18]  George Candea,et al.  Making automated testing of cloud applications an integral component of PaaS , 2013, APSys.

[19]  George Candea,et al.  LFI: A practical and general library-level fault injector , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[20]  David Brumley,et al.  Unleashing Mayhem on Binary Code , 2012, 2012 IEEE Symposium on Security and Privacy.

[21]  Michael Norrish,et al.  seL4: formal verification of an OS kernel , 2009, SOSP '09.

[22]  Boris Beizer,et al.  Black Box Testing: Techniques for Functional Testing of Software and Systems , 1996, IEEE Software.

[23]  Sarfraz Khurshid,et al.  Symbolic execution for software testing in practice: preliminary assessment , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[24]  George Candea,et al.  Data races vs. data race bugs: telling the difference with portend , 2012, ASPLOS XVII.

[25]  Joxan Jaffar,et al.  S3: A Symbolic String Solver for Vulnerability Detection in Web Applications , 2014, CCS.

[26]  Marco Canini,et al.  A NICE Way to Test OpenFlow Applications , 2012, NSDI.

[27]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[28]  Michael D. Ernst,et al.  Automatic creation of SQL Injection and cross-site scripting attacks , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[29]  Klaus Havelund,et al.  Model Checking Programs , 2004, Automated Software Engineering.

[30]  Vitaly Chipounov,et al.  Selective Symbolic Execution , 2009 .

[31]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[32]  Guodong Li,et al.  SymJS: automatic symbolic testing of JavaScript web applications , 2014, SIGSOFT FSE.

[33]  Koushik Sen,et al.  Jalangi: a selective record-replay and dynamic analysis framework for JavaScript , 2013, ESEC/FSE 2013.

[34]  Dawson R. Engler,et al.  RWset: Attacking Path Explosion in Constraint-Based Test Generation , 2008, TACAS.

[35]  Zhenkai Liang,et al.  BitBlaze: A New Approach to Computer Security via Binary Analysis , 2008, ICISS.

[36]  Sarfraz Khurshid,et al.  Test input generation with java PathFinder , 2004, ISSTA '04.

[37]  Karl N. Levitt,et al.  SELECT - a formal system for testing and debugging programs by symbolic execution , 1975, Reliable Software.

[38]  Magnus Madsen,et al.  Modeling the HTML DOM and browser API in static analysis of JavaScript web applications , 2011, ESEC/FSE '11.

[39]  George Candea,et al.  Prototyping symbolic execution engines for interpreted languages , 2014, ASPLOS.

[40]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[41]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[42]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[43]  Kent Beck,et al.  Kent Beck's Guide to Better Smalltalk: SIMPLE SMALLTALK TESTING , 1997 .

[44]  George Candea,et al.  Efficient state merging in symbolic execution , 2012, Software Engineering.

[45]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[46]  Sharad Malik,et al.  Chaff: engineering an efficient SAT solver , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).

[47]  Patrice Godefroid,et al.  SAGE: Whitebox Fuzzing for Security Testing , 2012, ACM Queue.

[48]  David Brumley,et al.  AEG: Automatic Exploit Generation , 2011, NDSS.

[49]  Koushik Sen,et al.  DART: directed automated random testing , 2005, PLDI '05.

[50]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[51]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[52]  Patrice Godefroid,et al.  Billions and billions of constraints: Whitebox fuzz testing in production , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[53]  Alex Groce,et al.  Tackling Large Verification Problems with the Swarm Tool , 2008, SPIN.

[54]  Corina S. Pasareanu,et al.  JPF-SE: A Symbolic Execution Extension to Java PathFinder , 2007, TACAS.

[55]  George Candea,et al.  Parallel symbolic execution for automated real-world software testing , 2011, EuroSys '11.

[56]  Nikolai Tillmann,et al.  Parameterized unit tests , 2005, ESEC/FSE-13.

[57]  Rupak Majumdar,et al.  Hybrid Concolic Testing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[58]  Radu Banabic,et al.  Techniques for Identifying Elusive Corner-Case Bugs in Systems Software , 2015 .

[59]  Michael D. Ernst,et al.  HAMPI: a solver for string constraints , 2009, ISSTA.

[60]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[61]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[62]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[63]  Nikolaj Bjørner,et al.  Generalized, efficient array decision procedures , 2009, 2009 Formal Methods in Computer-Aided Design.

[64]  Edsger W. Dijkstra,et al.  A Discipline of Programming , 1976 .

[65]  Niklas Sörensson,et al.  An Extensible SAT-solver , 2003, SAT.

[66]  Frank Tip,et al.  Finding bugs in dynamic web applications , 2008, ISSTA '08.

[67]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2011, SIGP.

[68]  Sriram K. Rajamani,et al.  The SLAM project: debugging system software via static analysis , 2002, POPL '02.

[69]  David Brumley,et al.  All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask) , 2010, 2010 IEEE Symposium on Security and Privacy.

[70]  George Candea,et al.  -OVERIFY: Optimizing Programs for Fast Verification , 2013, HotOS.

[71]  Lubos Brim,et al.  Scalable Multi-core LTL Model-Checking , 2007, SPIN.

[72]  Martin Vuagnoux,et al.  Autodafé: an Act of Software Torture , 2005 .

[73]  George Candea,et al.  Testing Closed-Source Binary Device Drivers with DDT , 2010, USENIX Annual Technical Conference.