Training Binary Classifiers as Data Structure Invariants

We present a technique to distinguish valid from invalid data structure objects. The technique is based on building an artificial neural network, more precisely a binary classifier, and training it to identify valid and invalid instances of a data structure. The obtained classifier can then be used in place of the data structure's invariant, in order to attempt to identify (in)correct behaviors in programs manipulating the structure. In order to produce the valid objects to train the network, an assumed-correct set of object building routines is randomly executed. Invalid instances are produced by generating values for object fields that "break" the collected valid values, i.e., that assign values to object fields that have not been observed as feasible in the assumed-correct executions that led to the collected valid instances. We experimentally assess this approach, over a benchmark of data structures. We show that this learning technique produces classifiers that achieve significantly better accuracy in classifying valid/invalid objects compared to a technique for dynamic invariant detection, and leads to improved bug finding.

[1]  Tiziana Margaria,et al.  Tools and algorithms for the construction and analysis of systems: a special issue for TACAS 2017 , 2001, International Journal on Software Tools for Technology Transfer.

[2]  Emina Torlak,et al.  Kodkod: A Relational Model Finder , 2007, TACAS.

[3]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[4]  Mike Barnett Code Contracts for .NET: Runtime Verification and So Much More , 2010, RV.

[5]  Bertrand Meyer,et al.  Automatic testing of object-oriented software , 2011 .

[6]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[7]  Anthony J. H. Simons,et al.  JWalk: a tool for lazy, systematic testing of java classes by design introspection and user interaction , 2007, Automated Software Engineering.

[8]  Carsten Sinz,et al.  LLBMC: Bounded Model Checking of C and C++ Programs Using a Compiler IR , 2012, VSTTE.

[9]  Siti Zaiton Mohd Hashim,et al.  Artificial neural networks as multi-networks automated test oracle , 2011, Automated Software Engineering.

[10]  Alexander Aiken,et al.  From invariant checking to invariant inference using randomized search , 2014, Formal Methods Syst. Des..

[11]  Sarfraz Khurshid,et al.  TestEra: A tool for testing Java programs using alloy specifications , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[12]  Sarfraz Khurshid,et al.  Test generation through programming in UDITA , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[13]  Daniel Jackson,et al.  Software Abstractions - Logic, Language, and Analysis , 2006 .

[14]  Somesh Jha,et al.  Isomorph-free model enumeration: a new method for checking relational specifications , 1998, TOPL.

[15]  簡聰富,et al.  物件導向軟體之架構(Object-Oriented Software Construction)探討 , 1989 .

[16]  Nazareno Aguirre,et al.  Field-exhaustive testing , 2016, SIGSOFT FSE.

[17]  Sarfraz Khurshid,et al.  History-Aware Data Structure Repair Using SAT , 2012, TACAS.

[18]  Bertrand Meyer,et al.  Object-Oriented Software Construction, 2nd Edition , 1997 .

[19]  Gary T. Leavens,et al.  Beyond Assertions: Advanced Specification and Verification with JML and ESC/Java2 , 2005, FMCO.

[20]  Kurt Mehlhorn,et al.  Review of algorithms and data structures: the basic toolbox by Kurt Mehlhorn and Peter Sanders , 2011, SIGA.

[21]  Sriram K. Rajamani,et al.  The YogiProject: Software Property Checking via Static Analysis and Testing , 2009, TACAS.

[22]  Claire Le Goues,et al.  GenProg: A Generic Method for Automatic Software Repair , 2012, IEEE Transactions on Software Engineering.

[23]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[24]  Bertrand Meyer Design By Contract. The Eiffel Method , 1998, Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No.98EX176).

[25]  Nikolai Tillmann,et al.  DySy: dynamic symbolic execution for invariant inference , 2008, ICSE.

[26]  Marcelo F. Frias,et al.  Analysis of invariants for efficient bounded verification , 2010, ISSTA '10.

[27]  Sarfraz Khurshid,et al.  Korat: automated testing based on Java predicates , 2002, ISSTA '02.

[28]  Nazareno Aguirre,et al.  Improving Test Generation under Rich Contracts by Tight Bounds and Incremental SAT Solving , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[29]  Christel Baier,et al.  Tools and Algorithms for the Construction and Analysis of Systems , 2015, Lecture Notes in Computer Science.

[30]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[31]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[32]  Barbara Liskov,et al.  Program Development in Java - Abstraction, Specification, and Object-Oriented Design , 1986 .

[33]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[34]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[35]  Xiangyu Zhang,et al.  Locating faults through automated predicate switching , 2006, ICSE.

[36]  Sarfraz Khurshid,et al.  Generalized Symbolic Execution for Model Checking and Testing , 2003, TACAS.

[37]  Siti Zaiton Mohd Hashim,et al.  An automated framework for software test oracle , 2011, Inf. Softw. Technol..

[38]  Claire Le Goues,et al.  Automatically finding patches using genetic programming , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[39]  Felix Sheng-Ho Chang,et al.  Modular verification of code with SAT , 2006, ISSTA '06.

[40]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[41]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[42]  Corina S. Pasareanu,et al.  Symbolic PathFinder: integrating symbolic execution with model checking for Java bytecode analysis , 2013, Automated Software Engineering.