Assurances in Software Testing: A Roadmap

As researchers, we already understand how to make testing more effective and efficient at finding bugs. However, as fuzzing (i.e., automated testing) becomes more widely adopted in practice, practitioners are asking: Which assurances does a fuzzing campaign provide that exposes no bugs? When is it safe to stop the fuzzer with a reasonable residual risk? How much longer should the fuzzer be run to achieve sufficient coverage? It is time for us to move beyond the innovation of increasingly sophisticated testing techniques, to build a body of knowledge around the explication and quantification of the testing process, and to develop sound methodologies to estimate and extrapolate these quantities with measurable accuracy. In our vision of the future practitioners leverage a rich statistical toolset to assess residual risk, to obtain statistical guarantees, and to analyze the cost-benefit trade-off for ongoing fuzzing campaigns. We propose a general framework as a first starting point to tackle this fundamental challenge and discuss a large number of concrete opportunities for future research.

[1]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[2]  I. Good THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS , 1953 .

[3]  H. Robbins Estimating the Total Probability of the Unobserved Outcomes of an Experiment , 1968 .

[4]  H. Robbins Estimating the Total Probability of the Unobserved Outcomes of an Experiment , 1968 .

[5]  B. D. Coleman On random placement and species-area relations , 1981 .

[6]  Elaine J. Weyuker,et al.  On Testing Non-Testable Programs , 1982, Comput. J..

[7]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .

[8]  A. Chao Estimating the population size for capture-recapture data with unequal catchability. , 1987, Biometrics.

[9]  J. Angus Extreme Value Theory in Engineering , 1990 .

[10]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[11]  John Rushby Formal methods for dependable real-time systems , 1993 .

[12]  A Chao,et al.  Estimating population size via sample coverage for closed capture-recapture models. , 1994, Biometrics.

[13]  J. Hüsler,et al.  Laws of Small Numbers: Extremes and Rare Events , 1994 .

[14]  Robert K. Colwell,et al.  BIODIVERSITY ASSESSMENT USING STRUCTURED INVENTORY: CAPTURING THE ANT FAUNA OF A TROPICAL RAIN FOREST , 1997 .

[15]  Paul Glasserman,et al.  Multilevel Splitting for Estimating Rare Event Probabilities , 1999, Oper. Res..

[16]  Mary Jean Harrold,et al.  Testing: a roadmap , 2000, ICSE '00.

[17]  Rushby John,et al.  Formal Methods and Digital Systems Validation for Airborne Systems , 2003 .

[18]  A. Chao,et al.  PREDICTING THE NUMBER OF NEW SPECIES IN FURTHER TAXONOMIC SAMPLING , 2003 .

[19]  Phil McMinn,et al.  Search-based software test data generation: a survey: Research Articles , 2004 .

[20]  Sanjeev R. Kulkarni,et al.  Strong Consistency of the Good-Turing Estimator , 2006, 2006 IEEE International Symposium on Information Theory.

[21]  S. Breck Book Review: Sampling Rare or Elusive Species: Concepts, Designs, and Techniques for Estimating Population Parameters , 2006 .

[22]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[23]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[24]  A. Janssen Laws of Small Numbers: Extremes and Rare Events, 2nd revised and extended edition edited by M. Falk, J. Huesler, and R.-D. Reiss , 2007 .

[25]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[26]  Kevin Fu,et al.  Pacemakers and Implantable Cardiac Defibrillators: Software Radio Attacks and Zero-Power Defenses , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[27]  Matthew R. Schofield,et al.  Efficient estimation of abundance for patchily distributed populations via two-phase, adaptive sampling. , 2008, Ecology.

[28]  Cun-Hui Zhang,et al.  Asymptotic normality of a nonparametric estimator of sample coverage , 2009, 0908.3440.

[29]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[30]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[31]  Derek Bruening,et al.  AddressSanitizer: A Fast Address Sanity Checker , 2012, USENIX Annual Technical Conference.

[32]  Matthew B. Dwyer,et al.  Probabilistic symbolic execution , 2012, ISSTA 2012.

[33]  J. B. Schmidt,et al.  Arthropod Diversity in a Tropical Forest , 2012, Science.

[34]  G. Seber,et al.  Adaptive Cluster Sampling , 2012 .

[35]  Robert K. Colwell,et al.  Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages , 2012 .

[36]  Soumya Paul,et al.  On the efficiency of automated testing , 2014, SIGSOFT FSE.

[37]  David S. Rosenblum,et al.  Known unknowns: testing in the presence of uncertainty , 2014, SIGSOFT FSE.

[38]  Mark Harman,et al.  A study of equivalent and stubborn mutation operators using human analysis of equivalence , 2014, ICSE.

[39]  Colleen Swanson,et al.  SoK: Security and Privacy in Implantable Medical Devices and Body Area Networks , 2014, 2014 IEEE Symposium on Security and Privacy.

[40]  Sarfraz Khurshid,et al.  Feedback-driven dynamic invariant discovery , 2014, ISSTA 2014.

[41]  Alon Orlitsky,et al.  Competitive Distribution Estimation: Why is Good-Turing Good , 2015, NIPS.

[42]  Robert K. Colwell,et al.  Unveiling the species-rank abundance distribution by generalizing the Good-Turing sample coverage theory. , 2015, Ecology.

[43]  Abhik Roychoudhury,et al.  Model-based whitebox fuzzing for program binaries , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[44]  Soumya Paul,et al.  A Probabilistic Analysis of the Efficiency of Automated Software Testing , 2016, IEEE Transactions on Software Engineering.

[45]  M. Bindemann,et al.  Species identification by experts and non-experts: comparing images from field guides , 2016, Scientific Reports.

[46]  Robert K. Colwell,et al.  Deciphering the enigma of undetected species, phylogenetic, and functional diversity based on Good-Turing theory. , 2017, Ecology.

[47]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[48]  Robert K. Colwell,et al.  Thirty years of progeny from Chao’s inequality: Estimating and comparing richness with incidence data and incomplete sampling , 2017 .

[49]  Andreas Zeller,et al.  Detecting information flow by mutating input data , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[50]  Peter W. O'Hearn,et al.  From Start-ups to Scale-ups: Opportunities and Open Problems for Static and Dynamic Program Analysis , 2018, 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[51]  Marcel Bohme,et al.  STADS: Software Testing as Species Discovery , 2018, 1803.02130.

[52]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2016, IEEE Transactions on Software Engineering.

[53]  Andrew E. Santosa,et al.  Smart Greybox Fuzzing , 2018, IEEE Transactions on Software Engineering.