Automatic mining of specifications from invocation traces and method invariants

Software library documentation often describes individual methods' APIs, but not the intended protocols and method interactions. This can lead to library misuse, and restrict runtime detection of protocol violations and automated verification of software that uses the library. Specification mining, if accurate, can help mitigate these issues, which has led to significant research into new model-inference techniques that produce FSM-based models from program invariants and execution traces. However, there is currently a lack of empirical studies that, in a principled way, measure the impact of the inference strategies on model quality. To this end, we identify four such strategies and systematically study the quality of the models they produce for nine off-the-shelf libraries. We find that (1) using invariants to infer an initial model significantly improves model quality, increasing precision by 4% and recall by 41%, on average; (2) effective invariant filtering is crucial for quality and scalability of strategies that use invariants; and (3) using traces in combination with invariants greatly improves robustness to input noise. We present our empirical evaluation, implement new and extend existing model-inference techniques, and make public our implementations, ground-truth models, and experimental data. Our work can lead to higher-quality model inference, and directly improve the techniques and tools that rely on model inference.

[1]  Leonardo Mariani,et al.  Automatic generation of software behavioral models , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[2]  Yuriy Brun,et al.  Using dynamic execution traces and program invariants to enhance behavioral model inference , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[3]  Westley Weimer,et al.  Synthesizing API usage examples , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[4]  Steven P. Reiss,et al.  Encoding program executions , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[5]  David Lo,et al.  Automatic steering of behavioral model inference , 2009, ESEC/SIGSOFT FSE.

[6]  Eran Yahav,et al.  Static Specification Mining Using Automata-Based Abstractions , 2007, IEEE Transactions on Software Engineering.

[7]  Andreas Zeller,et al.  Automatically Generating Test Cases for Specification Mining , 2012, IEEE Transactions on Software Engineering.

[8]  Martin P. Robillard,et al.  Creating and evolving developer documentation: understanding the decisions of open source contributors , 2010, FSE '10.

[9]  Bertrand Meyer,et al.  Inferring better contracts , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[10]  Manuvir Das,et al.  Perracotta: mining temporal API rules from imperfect traces , 2006, ICSE.

[11]  David Garlan,et al.  Architectural Mismatch: Why Reuse Is Still So Hard , 2009, IEEE Software.

[12]  Thomas R. Gross,et al.  A framework for the evaluation of specification miners based on finite state machines , 2010, 2010 IEEE International Conference on Software Maintenance.

[13]  Jonathan Aldrich,et al.  An Empirical Study of Object Protocols in the Wild , 2011, ECOOP.

[14]  Neil Walkinshaw,et al.  Inferring Finite-State Models with Temporal Constraints , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[15]  Edmund M. Clarke,et al.  Counterexample-guided abstraction refinement , 2003, 10th International Symposium on Temporal Representation and Reasoning, 2003 and Fourth International Conference on Temporal Logic. Proceedings..

[16]  Thomas R. Gross,et al.  Leveraging test generation and specification mining for automated bug detection without false positives , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[17]  N. Medvidović,et al.  Automatically Mining Specifications from Invocation Traces and Method Invariants , 2013 .

[18]  Zhendong Su,et al.  Online inference and enforcement of temporal properties , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[19]  Richard N. Taylor,et al.  Software architecture: foundations, theory, and practice , 2009, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[20]  Yuriy Brun,et al.  Speculative analysis of integrated development environment recommendations , 2012, OOPSLA '12.

[21]  Carlo Ghezzi,et al.  Mining behavior models from user-intensive web applications , 2014, ICSE.

[22]  David Lo,et al.  Scenario-based and value-based specification mining: better together , 2010, Automated Software Engineering.

[23]  Monica S. Lam,et al.  Automatic extraction of object-oriented component interfaces , 2002, ISSTA '02.

[24]  Jerome A. Feldman,et al.  On the Synthesis of Finite-State Machines from Samples of Their Behavior , 1972, IEEE Transactions on Computers.

[25]  Andreas Zeller,et al.  Mining behavior models from enterprise web applications , 2013, ESEC/FSE 2013.

[26]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[27]  Martin P. Robillard,et al.  Recovering traceability links between an API and its learning resources , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[28]  Yannis Smaragdakis,et al.  Second-order constraints in dynamic invariant inference , 2013, ESEC/FSE 2013.

[29]  David Lo,et al.  Learning extended FSA from software: An empirical assessment , 2012, J. Syst. Softw..

[30]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[31]  John Wang,et al.  Data Mining Software , 2008 .

[32]  Siau-Cheng Khoo,et al.  QUARK: Empirical Assessment of Automaton-based Specification Miners , 2006, 2006 13th Working Conference on Reverse Engineering.

[33]  Yuriy Brun,et al.  Unifying FSM-inference algorithms through declarative specification , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[34]  Martin P. Robillard,et al.  What Makes APIs Hard to Learn? Answers from Developers , 2009, IEEE Software.

[35]  Yuriy Brun,et al.  Leveraging existing instrumentation to automatically infer invariant-constrained models , 2011, ESEC/FSE '11.

[36]  Nikolai Tillmann,et al.  DySy: dynamic symbolic execution for invariant inference , 2008, ICSE.

[37]  Sebastián Uchitel,et al.  Enabledness-based program abstractions for behavior validation , 2013, TSEM.

[38]  Sandeep Kumar,et al.  Inferring class level specifications for distributed systems , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[39]  Yuriy Brun,et al.  Inferring models of concurrent systems from logs of their behavior with CSight , 2014, ICSE.

[40]  Yuriy Brun,et al.  Mining precise performance-aware behavioral models from existing instrumentation , 2014, ICSE Companion.

[41]  Zhendong Su,et al.  Javert: fully automatic mining of general temporal properties from dynamic traces , 2008, SIGSOFT '08/FSE-16.

[42]  Yuriy Brun,et al.  Behavioral resource-aware model inference , 2014, ASE.

[43]  Bertrand Meyer,et al.  A comparative study of programmer-written and automatically inferred contracts , 2009, ISSTA.

[44]  Sebastián Uchitel,et al.  Automated Abstractions for Contract Validation , 2012, IEEE Transactions on Software Engineering.

[45]  Kim G. Larsen,et al.  A modal process logic , 1988, [1988] Proceedings. Third Annual Information Symposium on Logic in Computer Science.

[46]  Grigore Rosu,et al.  Mining parametric specifications , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[47]  Michael D. Ernst,et al.  Combined static and dynamic automated test generation , 2011, ISSTA '11.

[48]  Yuriy Brun,et al.  Synthesizing partial component-level behavior models from system specifications , 2009, ESEC/FSE '09.

[49]  Siau-Cheng Khoo,et al.  SMArTIC: towards building an accurate, robust and scalable specification miner , 2006, SIGSOFT '06/FSE-14.

[50]  Chao Liu,et al.  Data Mining for Software Engineering , 2009, Computer.

[51]  Martin P. Robillard,et al.  Asking and answering questions about unfamiliar APIs: An exploratory study , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[52]  Michael D. Ernst,et al.  Automatically patching errors in deployed software , 2009, SOSP '09.

[53]  Mira Mezini,et al.  Learning from examples to improve code completion systems , 2009, ESEC/SIGSOFT FSE.

[54]  Andreas Zeller,et al.  Mining object behavior with ADABU , 2006, WODA '06.

[55]  Romain Robbes,et al.  How Program History Can Improve Code Completion , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.