Coverage guided, property based testing

Property-based random testing, exemplified by frameworks such as Haskell's QuickCheck, works by testing an executable predicate (a property) on a stream of randomly generated inputs. Property testing works very well in many cases, but not always. Some properties are conditioned on the input satisfying demanding semantic invariants that are not consequences of its syntactic structure---e.g., that an input list must be sorted or have no duplicates. Most randomly generated inputs fail to satisfy properties with such sparse preconditions, and so are simply discarded. As a result, much of the target system may go untested. We address this issue with a novel technique called coverage guided, property based testing (CGPT). Our approach is inspired by the related area of coverage guided fuzzing, exemplified by tools like AFL. Rather than just generating a fresh random input at each iteration, CGPT can also produce new inputs by mutating previous ones using type-aware, generic mutator operators. The target program is instrumented to track which control flow branches are executed during a run and inputs whose runs expand control-flow coverage are retained for future mutations. This means that, when sparse conditions in the target are satisfied and new coverage is observed, the input that triggered them will be retained and used as a springboard to go further. We have implemented CGPT as an extension to the QuickChick property testing tool for Coq programs; we call our implementation FuzzChick. We evaluate FuzzChick on two Coq developments for abstract machines that aim to enforce flavors of noninterference, which has a (very) sparse precondition. We systematically inject bugs in the machines' checking rules and use FuzzChick to look for counterexamples to the claim that they satisfy a standard noninterference property. We find that vanilla QuickChick almost always fails to find any bugs after a long period of time, as does an earlier proposal for combining property testing and fuzzing. In contrast, FuzzChick often finds them within seconds to minutes. Moreover, FuzzChick is almost fully automatic; although highly tuned, hand-written generators can find the bugs faster, they require substantial amounts of insight and manual effort.

[1]  Pablo Buiras,et al.  QuickFuzz testing for fun and profit , 2017, J. Syst. Softw..

[2]  Lukas Bulwahn,et al.  Smart Testing of Functional Programs in Isabelle , 2012, LPAR.

[3]  Rishabh Singh,et al.  Learn&Fuzz: Machine learning for input fuzzing , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[4]  Benjamin C. Pierce,et al.  Foundational Property-Based Testing , 2015, ITP.

[5]  Thomas H. Austin,et al.  Efficient purely-dynamic information flow analysis , 2009, PLAS '09.

[6]  Andrew Ruef,et al.  Evaluating Fuzz Testing , 2018, CCS.

[7]  Jasmin Christian Blanchette,et al.  Automatic proofs and refutations for higher-order logic , 2012 .

[8]  Andrew E. Santosa,et al.  Smart Greybox Fuzzing , 2018, IEEE Transactions on Software Engineering.

[9]  Benjamin C. Pierce,et al.  SAFE: A clean-slate architecture for secure systems , 2013, 2013 IEEE International Conference on Technologies for Homeland Security (HST).

[10]  Hao Chen,et al.  Angora: Efficient Fuzzing by Principled Search , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[11]  Daniel Jackson,et al.  Software Abstractions - Logic, Language, and Analysis , 2006 .

[12]  Benjamin C. Pierce,et al.  A verified information-flow architecture , 2014, J. Comput. Secur..

[13]  Christopher Krügel,et al.  DIFUZE: Interface Aware Fuzzing for Kernel Drivers , 2017, CCS.

[14]  Deian Stefan,et al.  Hails: Protecting Data Privacy in Untrusted Web Applications , 2012, OSDI.

[15]  Pablo Buiras,et al.  QuickFuzz: an automatic random fuzzer for common file formats , 2016, Haskell.

[16]  Simon Cruanes,et al.  Extending Nunchaku to Dependent Type Theory , 2016, HaTT@IJCAR.

[17]  J. Meseguer,et al.  Security Policies and Security Models , 1982, 1982 IEEE Symposium on Security and Privacy.

[18]  Meng Xu,et al.  QSYM : A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing , 2018, USENIX Security Symposium.

[19]  Anja Feldmann,et al.  Static Program Analysis as a Fuzzing Aid , 2017, RAID.

[20]  Christopher Krügel,et al.  Driller: Augmenting Fuzzing Through Selective Symbolic Execution , 2016, NDSS.

[21]  Zhong Shao,et al.  CertiKOS: An Extensible Architecture for Building Certified Concurrent OS Kernels , 2016, OSDI.

[22]  Xavier Leroy,et al.  Formal verification of a realistic compiler , 2009, CACM.

[23]  B. Pierce,et al.  A Coq Framework For Verified Property-Based Testing ( Extended Abstract ) , 2014 .

[24]  Benjamin C. Pierce,et al.  A Theory of Information-Flow Labels , 2013, 2013 IEEE 26th Computer Security Foundations Symposium.

[25]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[26]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[27]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[28]  Andrew C. Myers,et al.  Programming Languages for Information Security , 2002 .

[29]  Lukas Bulwahn,et al.  The New Quickcheck for Isabelle - Random, Exhaustive and Symbolic Testing under One Roof , 2012, CPP.

[30]  Andrew C. Myers,et al.  Language-based information-flow security , 2003, IEEE J. Sel. Areas Commun..

[31]  Sarfraz Khurshid,et al.  Test generation through programming in UDITA , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[32]  B. Pierce,et al.  QuickChick: Property-based testing for Coq , 2014 .

[33]  Koen Claessen,et al.  Splittable pseudorandom number generators using cryptographic hashing , 2013, Haskell '13.

[34]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[35]  Koen Claessen,et al.  Making Random Judgments: Automatically Generating Well-Typed Terms from the Definition of a Type-System , 2015, ESOP.

[36]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[37]  Konstantinos Sagonas,et al.  A PropEr integration of types and function specifications with property-based testing , 2011, Erlang Workshop.

[38]  Deian Stefan,et al.  Disjunction Category Labels , 2011, NordSec.

[39]  Koen Claessen,et al.  Generating constrained random data with uniform distribution , 2014, Journal of Functional Programming.

[40]  Chao Zhang,et al.  CollAFL: Path Sensitive Fuzzing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[41]  Deian Stefan,et al.  Flexible dynamic information flow control in Haskell , 2012, Haskell '11.

[42]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[43]  Leonidas Lampropoulos,et al.  Random Testing for Language Design , 2018 .

[44]  Mathias Payer,et al.  T-Fuzz: Fuzzing by Program Transformation , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[45]  Ranjit Jhala,et al.  Type Targeted Testing , 2014, ESOP.

[46]  Yang Liu,et al.  Superion: Grammar-Aware Greybox Fuzzing , 2018, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[47]  Emina Torlak,et al.  Kodkod: A Relational Model Finder , 2007, TACAS.

[48]  Tjark Weber,et al.  Bounded Model Generation for Isabelle/HOL , 2005, D/PDPAR@IJCAR.

[49]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[50]  Benjamin C. Pierce,et al.  Testing noninterference, quickly , 2013, Journal of Functional Programming.

[51]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[52]  Herbert Bos,et al.  TIFF: Using Input Type Inference To Improve Fuzzing , 2018, ACSAC.

[53]  Yang Liu,et al.  Skyfire: Data-Driven Seed Generation for Fuzzing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[54]  Benjamin C. Pierce,et al.  Generating good generators for inductive relations , 2017, Proc. ACM Program. Lang..

[55]  Tobias Nipkow,et al.  Nitpick: A Counterexample Generator for Higher-Order Logic Based on a Relational Model Finder , 2010, ITP.

[56]  Alexander Aiken,et al.  Synthesizing program input grammars , 2016, PLDI.

[57]  Benjamin C. Pierce,et al.  Beginner's luck: a language for property-based generators , 2016, POPL.

[58]  Benjamin C. Pierce,et al.  All Your IFCException Are Belong to Us , 2013, 2013 IEEE Symposium on Security and Privacy.