An approach for bug localization in models using two levels: model and metamodel

Bug localization is a common task in software engineering, especially when maintaining and evolving software products. This paper introduces a bug localization approach that, in contrast to existing source code approaches, takes advantage of domain information found in the model and the metamodel. Throughout this paper, we present an approach for bug localization in models (BLiM2) that applies the source code ideas for bug localization (textual similarity to the bug description and the Defect Localization Principle) and takes advantage of the domain information from the model and the metamodel. We evaluated our approach in BSH, a real-world industrial case study in the induction hob domain measuring the results in terms of recall, precision, the combination of both the F-measure and the Matthews correlation coefficient. Our study shows that our BLiM2 approach, which combines information from the model and the metamodel for the textual similarity and differentiates between the timespan from the model and metamodel, provides the best results in this work. We also performed a statistical analysis to provide evidence of the significance of the results. The values obtained show that there exist significant differences in the performance of the best BLiM2 approach with the approach used by our industrial partner. Finally, the effect size statistics reveals that the best BLiM2 approach obtains better results in the 78% of the times in the worst case.

[1]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[2]  Jacques Klein,et al.  Bottom-up adoption of software product lines: a generic and extensible approach , 2015, SPLC.

[3]  Donglin Liang,et al.  Equivalence analysis and its application in improving the efficiency of program slicing , 2002, TSEM.

[4]  Meir M. Lehman,et al.  A Paradigm for the Behavioural Modelling of Software Processes using System Dynamics , 2001 .

[5]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[6]  Sergio Segura,et al.  An assessment of search-based techniques for reverse engineering feature models , 2015, J. Syst. Softw..

[7]  Baishakhi Ray,et al.  Poster: Which Similarity Metric to Use for Software Documents?: A Study on Information Retrieval Based Software Engineering Tasks , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[8]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[9]  Birger Møller-Pedersen,et al.  Augmenting Product Lines , 2012, 2012 19th Asia-Pacific Software Engineering Conference.

[10]  Birger Møller-Pedersen,et al.  Model Comparison to Synthesize a Model-Driven Software Product Line , 2011, 2011 15th International Software Product Line Conference.

[11]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[12]  David M. W. Powers,et al.  Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation , 2011, ArXiv.

[13]  David Lo,et al.  Network-Clustered Multi-Modal Bug Localization , 2018, IEEE Transactions on Software Engineering.

[14]  Marcelo d'Amorim,et al.  Fault-localization using dynamic slicing and change impact analysis , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[15]  Denys Poshyvanyk,et al.  Journal of Software Maintenance and Evolution: Research and Practice Assigning Change Requests to Software Developers , 2022 .

[16]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[17]  Jaime Font,et al.  Feature location in models through a genetic algorithm driven by information retrieval techniques , 2016, MoDELS.

[18]  David J. Groggel,et al.  Practical Nonparametric Statistics , 2000, Technometrics.

[19]  Jaime Font,et al.  On the Influence of Models at Run-Time Traces in Dynamic Feature Location , 2017, ECMFA.

[20]  Bogdan Dit,et al.  Using Data Fusion and Web Mining to Support Feature Location in Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[21]  David Lo,et al.  Information retrieval and spectrum based bug localization: better together , 2015, ESEC/SIGSOFT FSE.

[22]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[23]  DebK.,et al.  A fast and elitist multiobjective genetic algorithm , 2002 .

[24]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[25]  Birger Møller-Pedersen,et al.  Adding Standardized Variability to Domain Specific Languages , 2008, 2008 12th International Software Product Line Conference.

[26]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[27]  Jaime Font,et al.  Feature Location in Model-Based Software Product Lines Through a Genetic Algorithm , 2016, ICSR.

[28]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[29]  David Lo,et al.  Interactive fault localization leveraging simple user feedback , 2012, 2012 28th IEEE International Conference on Software Maintenance (ICSM).

[30]  A. Dias-Neto,et al.  0006/2011 - Threats to Validity in Search-based Software Engineering Empirical Studies , 2011 .

[31]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[32]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[33]  Anh Tuan Nguyen,et al.  Bug Localization with Combination of Deep Learning and Information Retrieval , 2017, 2017 IEEE/ACM 25th International Conference on Program Comprehension (ICPC).

[34]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[35]  Birgit Vogel-Heuser,et al.  Family model mining for function block diagrams in automation software , 2014, SPLC '14.

[36]  William B. Frakes,et al.  Stemming Algorithms , 1992, Information Retrieval: Data Structures & Algorithms.

[37]  R. Grissom,et al.  Effect sizes for research: A broad practical approach. , 2005 .

[38]  Haitham M. Al-Angari,et al.  Association of Diabetes Related Complications with Heart Rate Variability among a Diabetic Population in the UAE , 2017, PloS one.

[39]  Shinji Kusumoto,et al.  Experimental Evaluation of Program Slicing for Fault Localization , 2002, Empirical Software Engineering.

[40]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[41]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[42]  T. Obremski Practical Nonparametric Statistics (2nd ed.) , 1981 .

[43]  Jaime Font,et al.  Leveraging variability modeling to address metamodel revisions in Model-based Software Product Lines , 2017, Comput. Lang. Syst. Struct..

[44]  Yuhua Qi,et al.  Slice-based statistical fault localization , 2014, J. Syst. Softw..

[45]  Gordon Fraser,et al.  Parameter tuning or default values? An empirical investigation in search-based software engineering , 2013, Empirical Software Engineering.

[46]  Kalyanmoy Deb,et al.  A fast and elitist multiobjective genetic algorithm: NSGA-II , 2002, IEEE Trans. Evol. Comput..

[47]  Sandro Schulze,et al.  Interface variability in family model mining , 2013, SPLC '13 Workshops.

[48]  Francisco Herrera,et al.  Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power , 2010, Inf. Sci..

[49]  Nelly Bencomo,et al.  A View of the Dynamic Software Product Line Landscape , 2012, Computer.

[50]  John Anvik,et al.  A noun-based approach to feature location using time-aware term-weighting , 2014, Inf. Softw. Technol..

[51]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[52]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[53]  Andreas Zeller,et al.  Where Should We Fix This Bug? A Two-Phase Recommendation Model , 2013, IEEE Transactions on Software Engineering.

[54]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[55]  Mark Harman,et al.  Transformed Vargha-Delaney Effect Size , 2015, SSBSE.

[56]  Andrea De Lucia,et al.  Parameterizing and Assembling IR-Based Solutions for SE Tasks Using Genetic Algorithms , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[57]  Avinash C. Kak,et al.  Incorporating version histories in Information Retrieval based bug localization , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[58]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[59]  Tim Menzies,et al.  Scalable product line configuration: A straw to break the camel's back , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[60]  Rui Abreu,et al.  A Survey on Software Fault Localization , 2016, IEEE Transactions on Software Engineering.

[61]  Jacques Klein,et al.  Automating the Extraction of Model-Based Software Product Lines from Model Variants (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[62]  Jaime Font,et al.  On the Influence of Modification Timespan Weightings in the Location of Bugs in Models , 2017, ISD.

[63]  Birger Møller-Pedersen,et al.  Developing a Software Product Line for Train Control: A Case Study of CVL , 2010, SPLC.

[64]  Ahmed E. Hassan,et al.  Mining Unstructured Software Repositories , 2014, Evolving Software Systems.

[65]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.

[66]  David Lo,et al.  AmaLgam+: Composing Rich Information Sources for Accurate Bug Localization , 2016, J. Softw. Evol. Process..