Exploiting user feedback to facilitate observation-based testing

Recent progress in the use of data mining and statistical techniques to automatically classify related software failures and to localize defects suggests that if appropriate information is collected about executions of deployed software, then such techniques can assist developers in prioritizing action on soft failures reported by users and in diagnosing their causes. Users, however, are not reliable judges of correct software behavior: they may overlook real failures, neglect to report failures they do observe, or report spurious failures. Instead, I propose to employ users as independent checks on each other. Previous work demonstrated that executions with similar execution profiles often represent similar program behavior. By grouping similar executions together, developers can use user-submitted labels to corroborate each other: similar executions with the same label represent consensus, and similar executions with differing labels represent suspicious or confusing behavior. An empirical evaluation of two proposed techniques, Corroboration-based Filtering, Review-All-FAILUREs plus k-Nearest Neighbors, indicates that they discover significantly more failures and defects than the naive review-all-FAILUREs strategy. A third technique, round-robin cluster sampling, discovers failures and defects more quickly than RAF.

[1]  Suresh Jagannathan,et al.  PHALANX: a graph-theoretic framework for test case prioritization , 2008, SAC '08.

[2]  George Candea,et al.  Combining Visualization and Statistical Analysis to Improve Operator Confidence and Efficiency for Failure Detection and Localization , 2005, Second International Conference on Autonomic Computing (ICAC'05).

[3]  Charles Yang,et al.  Estimation of software reliability by stratified sampling , 1999, TSEM.

[4]  Chengying Mao,et al.  Extracting the Representative Failure Executions via Clustering Analysis Based on Markov Profile Model , 2005, ADMA.

[5]  D. Ohsie,et al.  High speed and robust event correlation , 1996, IEEE Commun. Mag..

[6]  Brendan Murphy,et al.  Measuring Reliability of Software Products , 2004 .

[7]  Michael I. Jordan,et al.  Statistical Debugging of Sampled Programs , 2003, NIPS.

[8]  Michael I. Jordan,et al.  Statistical debugging: simultaneous identification of multiple bugs , 2006, ICML.

[9]  Eric A. Brewer,et al.  Pinpoint: problem determination in large, dynamic Internet services , 2002, Proceedings International Conference on Dependable Systems and Networks.

[10]  Chao Liu,et al.  Failure proximity: a fault localization-based approach , 2006, SIGSOFT '06/FSE-14.

[11]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[12]  David Leon,et al.  Finding failures by cluster analysis of execution profiles , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[13]  Alexander S. Szalay,et al.  Galaxy Zoo: An Experiment in Public Science Participation , 2007 .

[14]  Alain Baccini,et al.  yaImpute: An R Package for kNN Imputation , 2007 .

[15]  David Leon,et al.  A comparison of coverage-based and distribution-based techniques for filtering and prioritizing test cases , 2003, 14th International Symposium on Software Reliability Engineering, 2003. ISSRE 2003..

[16]  Tao Xie,et al.  Helping users avoid bugs in GUI applications , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[17]  David Leon,et al.  Pursuing failure: the distribution of program failures in a profile space , 2001, ESEC/FSE-9.

[18]  David Leon,et al.  Tree-based methods for classifying software failures , 2004, 15th International Symposium on Software Reliability Engineering.

[19]  Katsuro Inoue,et al.  A Proposed Method for Building a Database of Project Measurements and Applying it Using Collaborative Filtering , 2006 .

[20]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[21]  David Leon,et al.  An empirical evaluation of test case filtering techniques based on exercising complex information flows , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[22]  Seraphin B. Calo,et al.  Alarm correlation and fault identification in communication networks , 1994, IEEE Trans. Commun..

[23]  Boris Gruschke A New Approach for Event Correlation based on Dependency Graphs , 1998 .

[24]  Mary Jean Harrold,et al.  Debugging in Parallel , 2007, ISSTA '07.

[25]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[26]  Ken-ichi Matsumoto,et al.  Accelerating cross-project knowledge collaboration using collaborative filtering and social networks , 2005, ACM SIGSOFT Softw. Eng. Notes.

[27]  Mel Ó Cinnéide,et al.  A Recommender Agent for Software Libraries: An Evaluation of Memory-Based and Model-Based Collaborative Filtering , 2006, 2006 IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[28]  Jane Cleland-Huang,et al.  Using data mining and recommender systems to scale up the requirements process , 2008, ULSSIS '08.

[29]  Andrew W. Moore,et al.  Active Learning for Anomaly and Rare-Category Detection , 2004, NIPS.

[30]  Donald E. Porter,et al.  Improved error reporting for software that uses black-box components , 2007, PLDI '07.

[31]  Hans-Peter Kriegel,et al.  Clustering Multi-represented Objects with Noise , 2004, PAKDD.

[32]  Brendan Murphy,et al.  Reliability growth in software products , 2004, 15th International Symposium on Software Reliability Engineering.

[33]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[34]  Jingrui He,et al.  Nearest-Neighbor-Based Active Learning for Rare Category Detection , 2007, NIPS.

[35]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.