Improving Test Distance for Failure Clustering with Hypergraph Modelling

Automated debugging techniques, such as Fault Localisation (FL) or Automated Program Repair (APR), are typically designed under the Single Fault Assumption (SFA). However, in practice, an unknown number of faults can independently cause multiple test case failures, making it difficult to allocate resources for debugging and to use automated debugging techniques. Clustering algorithms have been applied to group the test failures according to their root causes, but their accuracy can often be lacking due to the inherent limits in the distance metrics for test cases. We introduce a new test distance metric based on hypergraphs and evaluate their accuracy using multi-fault benchmarks that we have built on top of Defects4J and SIR. Results show that our technique, Hybiscus, can automatically achieve perfect clustering (i.e., the same number of clusters as the ground truth number of root causes, with all failing tests with the same root cause grouped together) for 418 out of 605 test runs with multiple test failures. Better failure clustering also allows us to separate different root causes and apply FL techniques under SFA, resulting in saving up to 82% of the total wasted effort when compared to the state-of-the-art technique for multiple fault localisation.

[1]  Abhik Roychoudhury,et al.  Bucketing Failing Tests via Symbolic Analysis , 2017, FASE.

[2]  Antoine Zambelli,et al.  A data-driven approach to estimating the number of clusters in hierarchical clustering , 2016, F1000Research.

[3]  Shin Hong,et al.  MUSEUM: Debugging real-world multilingual programs using mutation analysis , 2017, Inf. Softw. Technol..

[4]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[5]  Charles Yang,et al.  Partition testing, stratified sampling, and cluster analysis , 1993, SIGSOFT '93.

[6]  Zhenyu Chen,et al.  WAS: A weighted attribute-based strategy for cluster test selection , 2014, J. Syst. Softw..

[7]  Boleslaw K. Szymanski,et al.  Overlapping community detection in networks: The state-of-the-art and comparative study , 2011, CSUR.

[8]  Claire Le Goues,et al.  Semantic Crash Bucketing , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[9]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[10]  Abhik Roychoudhury,et al.  Angelix: Scalable Multiline Program Patch Synthesis via Symbolic Analysis , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[11]  W. Eric Wong,et al.  The DStar Method for Effective Software Fault Localization , 2014, IEEE Transactions on Reliability.

[12]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[13]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[14]  Chao Liu,et al.  Failure proximity: a fault localization-based approach , 2006, SIGSOFT '06/FSE-14.

[15]  Mark Harman,et al.  Clustering test cases to achieve effective and scalable prioritisation incorporating expert knowledge , 2009, ISSTA.

[16]  Tat-Jun Chin,et al.  Clustering with Hypergraphs: The Case for Large Hyperedges , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  James A. Jones,et al.  Fault density, fault types, and spectra-based fault localization , 2015, Empirical Software Engineering.

[18]  Yunjun Gao,et al.  Scalable Hypergraph-Based Image Retrieval and Tagging System , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[19]  Joyce Jiyoung Whang,et al.  Non-Exhaustive, Overlapping Clustering , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[21]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Ruizhi Gao,et al.  MSeer—An Advanced Technique for Locating Multiple Bugs in Parallel , 2019, IEEE Transactions on Software Engineering.

[23]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[24]  Bin Wang,et al.  Automated support for classifying software failure reports , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[25]  Wolfgang Banzhaf,et al.  ARJA: Automated Repair of Java Programs via Multi-Objective Genetic Programming , 2017, IEEE Transactions on Software Engineering.

[26]  Peter Zoeteweij,et al.  A practical evaluation of spectrum-based fault localization , 2009, J. Syst. Softw..

[27]  Chao Liu,et al.  A Systematic Study of Failure Proximity , 2008, IEEE Transactions on Software Engineering.

[28]  Akbar Siami Namin,et al.  How Significant is the Effect of Fault Interactions on Coverage-Based Fault Localizations? , 2013, 2013 ACM / IEEE International Symposium on Empirical Software Engineering and Measurement.

[29]  David Lo,et al.  S3: syntax- and semantic-guided repair synthesis via programming by examples , 2017, ESEC/SIGSOFT FSE.

[30]  James Bailey,et al.  Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance , 2010, J. Mach. Learn. Res..

[31]  Mary Jean Harrold,et al.  Debugging in Parallel , 2007, ISSTA '07.

[32]  Lei Zhao,et al.  A Crosstab-based Statistical Method for Effective Fault Localization , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.

[33]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Alexander Pretschner,et al.  Reducing Failure Analysis Time: An Industrial Evaluation , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP).

[35]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[36]  James A. Jones,et al.  On the influence of multiple faults on coverage-based fault localization , 2011, ISSTA '11.

[37]  Michael D. Ernst,et al.  Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[38]  Friedrich Steimann,et al.  Improving Coverage-Based Localization of Multiple Faults Using Algorithms from Integer Linear Programming , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[39]  Rui Abreu,et al.  Multiple fault localization of software programs: A systematic literature review , 2020, Inf. Softw. Technol..

[40]  Mário Antunes,et al.  Knee/Elbow Estimation Based on First Derivative Threshold , 2018, 2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService).

[41]  Hila Becker,et al.  Identification and Characterization of Events in Social Media , 2011 .

[42]  Shin Yoo,et al.  Ask the Mutants: Mutating Faulty Programs for Fault Localization , 2014, 2014 IEEE Seventh International Conference on Software Testing, Verification and Validation.

[43]  Yves Le Traon,et al.  Metallaxis‐FL: mutation‐based fault localization , 2015, Softw. Test. Verification Reliab..

[44]  Hideyuki Suzuki,et al.  Hypergraph p-Laplacian: A Differential Geometry View , 2017, AAAI.

[45]  Alessandro Orso,et al.  Are automated debugging techniques actually helping programmers? , 2011, ISSTA '11.

[46]  W. Eric Wong,et al.  Insights on Fault Interference for Programs with Multiple Bugs , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[47]  Alexander Pretschner,et al.  Failure clustering without coverage , 2019, ISSTA.

[48]  James M. Rehg,et al.  Active learning for automatic classification of software behavior , 2004, ISSTA '04.

[49]  James A. Jones,et al.  Software Behavior and Failure Clustering: An Empirical Study of Fault Causality , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[50]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[51]  Lee Naish,et al.  A model for spectra-based software diagnosis , 2011, TSEM.

[52]  Friedrich Steimann,et al.  More Debugging in Parallel , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[53]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[54]  Inderjit S. Dhillon,et al.  Overlapping Community Detection Using Neighborhood-Inflated Seed Expansion , 2015, IEEE Transactions on Knowledge and Data Engineering.

[55]  Ming Wen,et al.  Context-Aware Patch Generation for Better Automated Program Repair , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[56]  John T. Stasko,et al.  Visualization of test information to assist fault localization , 2002, ICSE '02.

[57]  Yang Feng,et al.  An empirical study on clustering for isolating bugs in fault localization , 2013, 2013 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW).

[58]  Kai-Yuan Cai,et al.  Does the Failing Test Execute a Single or Multiple Faults? An Approach to Classifying Failing Tests , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[59]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[60]  Yves Le Traon,et al.  Using Mutants to Locate "Unknown" Faults , 2012, 2012 IEEE Fifth International Conference on Software Testing, Verification and Validation.

[61]  Haesun Park,et al.  MEGA: Multi-View Semi-Supervised Clustering of Hypergraphs , 2020, Proc. VLDB Endow..

[62]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[63]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.