The State of Empirical Evaluation in Static Feature Location

Feature location (FL) is the task of finding the source code that implements a specific, user-observable functionality in a software system. It plays a key role in many software maintenance tasks and a wide variety of Feature Location Techniques (FLTs), which rely on source code structure or textual analysis, have been proposed by researchers. As FLTs evolve and more novel FLTs are introduced, it is important to perform comparison studies to investigate “Which are the best FLTs?” However, an initial reading of the literature suggests that performing such comparisons would be an arduous process, based on the large number of techniques to be compared, the heterogeneous nature of the empirical designs, and the lack of transparency in the literature. This article presents a systematic review of 170 FLT articles, published between the years 2000 and 2015. Results of the systematic review indicate that 95% of the articles studied are directed towards novelty, in that they propose a novel FLT. Sixty-nine percent of these novel FLTs are evaluated through standard empirical methods but, of those, only 9% use baseline technique(s) in their evaluations to allow cross comparison with other techniques. The heterogeneity of empirical evaluation is also clearly apparent: altogether, over 60 different FLT evaluation metrics are used across the 170 articles, 272 subject systems have been used, and 235 different benchmarks employed. The review also identifies numerous user input formats as contributing to the heterogeneity. Analysis of the existing research also suggests that only 27% of the FLTs presented might be reproduced from the published material. These findings suggest that comparison across the existing body of FLT evaluations is very difficult. We conclude by providing guidelines for empirical evaluation of FLTs that may ultimately help to standardise empirical research in the field, cognisant of FLTs with different goals, leveraging common practices in existing empirical evaluations and allied with rationalisations. This is seen as a step towards standardising evaluation in the field, thus facilitating comparison across FLTs.

[1]  Bogdan Dit,et al.  Supporting and accelerating reproducible empirical research in software evolution and maintenance using TraceLab Component Library , 2015, Empirical Software Engineering.

[2]  Xingjun Zhang,et al.  Comparing learning to rank techniques in hybrid bug localization , 2018, Appl. Soft Comput..

[3]  Audris Mockus,et al.  Software Dependencies, Work Dependencies, and Their Impact on Failures , 2009, IEEE Transactions on Software Engineering.

[4]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[5]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[6]  Anas Mahmoud,et al.  Estimating Semantic Relatedness in Source Code , 2015, ACM Trans. Softw. Eng. Methodol..

[7]  Martin P. Robillard,et al.  Representing concerns in source code , 2007, TSEM.

[8]  Natalia Juristo Juzgado,et al.  Replications of software engineering experiments , 2013, Empirical Software Engineering.

[9]  Christian Kästner,et al.  Variability Mining: Consistent Semi-automatic Detection of Product-Line Features , 2014, IEEE Transactions on Software Engineering.

[10]  Per Runeson,et al.  Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability , 2013, Empirical Software Engineering.

[11]  Denys Poshyvanyk,et al.  Integrating conceptual and logical couplings for change impact analysis in software , 2013, Empirical Software Engineering.

[12]  Jian Zhou,et al.  Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[13]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[14]  Emily Hill,et al.  Exploring the Use of Concern Element Role Information in Feature Location Evaluation , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[15]  Abdelrahman Hosny Is your research reproducible? , 2016, XRDS.

[16]  David W. Binkley,et al.  An empirical study of slice-based cohesion and coupling metrics , 2007, TSEM.

[17]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[18]  Václav Rajlich,et al.  Case study of feature location using dependence graph , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[19]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A systematic literature review , 2009, Inf. Softw. Technol..

[20]  Yijun Yu,et al.  Iterative context-aware feature location: (NIER track) , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[21]  Marsha Chechik,et al.  A Survey of Feature Location Techniques , 2013, Domain Engineering, Product Lines, Languages, and Conceptual Models.

[22]  Gabriele Bavota,et al.  Query-based configuration of text retrieval solutions for software engineering tasks , 2015, ESEC/SIGSOFT FSE.

[23]  Giuliano Antoniol,et al.  Can Better Identifier Splitting Techniques Help Feature Location? , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[24]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[25]  Qing Zhang,et al.  CVSSearch: searching through source code using CVS comments , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[26]  Bogdan Dit,et al.  ImpactMiner: a tool for change impact analysis , 2014, ICSE Companion.

[27]  Martin P. Robillard,et al.  Concern graphs: finding and describing concerns using structural program dependencies , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[28]  Lori L. Pollock,et al.  Using language clues to discover crosscutting concerns , 2005, ACM SIGSOFT Softw. Eng. Notes.

[29]  Christopher M. Lott,et al.  Repeatable software engineering experiments for comparing defect-detection techniques , 2004, Empirical Software Engineering.

[30]  Andreas Grimmer,et al.  Identifying inactive code in product lines with configuration-aware system dependence graphs , 2014, SPLC.

[31]  Denys Poshyvanyk,et al.  An exploratory study on assessing feature location techniques , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[32]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[33]  Liming Zhu,et al.  Evaluating guidelines for reporting empirical software engineering studies , 2008, Empirical Software Engineering.

[34]  Michael English,et al.  An empirical analysis of information retrieval based concept location techniques in software comprehension , 2008, Empirical Software Engineering.

[35]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[36]  Sarfraz Khurshid,et al.  Improving bug localization using structured information retrieval , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[37]  Collin McMillan,et al.  Portfolio: Searching for relevant functions and their usages in millions of lines of code , 2013, TSEM.

[38]  Tibor Gyimóthy,et al.  Comparison of different impact analysis methods and programmer's opinion: an empirical study , 2010, PPPJ.

[39]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[40]  Tibor Gyimóthy,et al.  Using information retrieval based coupling measures for impact analysis , 2009, Empirical Software Engineering.

[41]  Jane Cleland-Huang,et al.  A Framework for Evaluating Traceability Benchmark Metrics , 2012 .

[42]  Avinash C. Kak,et al.  Assisting code search with automatic Query Reformulation for bug localization , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[43]  Razvan C. Bunescu,et al.  Mapping Bug Reports to Relevant Files: A Ranking Model, a Fine-Grained Benchmark, and Feature Evaluation , 2016, IEEE Transactions on Software Engineering.

[44]  Norman Wilde,et al.  A comparison of methods for locating features in legacy software , 2003, J. Syst. Softw..

[45]  Christian S. Collberg,et al.  Repeatability in computer systems research , 2016, Commun. ACM.

[46]  Gerald Reif,et al.  Supporting developers with natural language queries , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[47]  Norman Wilde,et al.  A case study of feature location in unstructured legacy Fortran code , 2001, Proceedings Fifth European Conference on Software Maintenance and Reengineering.

[48]  Yann-Gaël Guéhéneuc,et al.  Trustrace: Mining Software Repositories to Improve the Accuracy of Requirement Traceability Links , 2013, IEEE Transactions on Software Engineering.

[49]  Kai Petersen,et al.  Systematic Mapping Studies in Software Engineering , 2008, EASE.

[50]  Alfred V. Aho,et al.  Do Crosscutting Concerns Cause Defects? , 2008, IEEE Transactions on Software Engineering.

[51]  Nan Niu,et al.  On the role of semantics in automated requirements tracing , 2014, Requirements Engineering.

[52]  Andrea De Lucia,et al.  Parameterizing and Assembling IR-Based Solutions for SE Tasks Using Genetic Algorithms , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[53]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[54]  Andrea De Lucia,et al.  How to effectively use topic models for software engineering tasks? An approach based on Genetic Algorithms , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[55]  Bogdan Dit,et al.  A dataset from change history to support evaluation of software maintenance tasks , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[56]  Danny Weyns,et al.  Variability in Software Systems—A Systematic Literature Review , 2014, IEEE Transactions on Software Engineering.

[57]  Emily Hill,et al.  Which Feature Location Technique is Better? , 2013, 2013 IEEE International Conference on Software Maintenance.

[58]  Giuseppe Scanniello,et al.  Link analysis algorithms for static concept location: an empirical assessment , 2014, Empirical Software Engineering.

[59]  Hung Viet Nguyen,et al.  A topic-based approach for narrowing the search space of buggy files from a bug report , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[60]  David W. Binkley,et al.  Enabling improved IR-based feature location , 2015, J. Syst. Softw..

[61]  Rainer Koschke,et al.  Locating Features in Source Code , 2003, IEEE Trans. Software Eng..

[62]  Lefteris Angelis,et al.  An Empirical Study on Views of Importance of Change Impact Analysis Issues , 2008, IEEE Transactions on Software Engineering.

[63]  David Lo,et al.  Version history, similar report, and structure: putting them together for improved bug localization , 2014, ICPC 2014.

[64]  Tracy Hall,et al.  A Systematic Literature Review on Fault Prediction Performance in Software Engineering , 2012, IEEE Transactions on Software Engineering.

[65]  Letha H. Etzkorn,et al.  Configuring latent Dirichlet allocation based feature location , 2014, Empirical Software Engineering.

[66]  Denys Poshyvanyk,et al.  Concept location using formal concept analysis and information retrieval , 2012, TSEM.

[67]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[68]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[69]  Martin P. Robillard,et al.  Topology analysis of software dependencies , 2008, TSEM.

[70]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[71]  Sebastian Herold,et al.  FLINTS: a tool for architectural-level modeling of features in software systems , 2016, ECSA Workshops.

[72]  Denys Poshyvanyk,et al.  FLAT3: feature location and textual tracing tool , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[73]  David Lo,et al.  Multi-abstraction Concern Localization , 2013, 2013 IEEE International Conference on Software Maintenance.

[74]  Sarfraz Khurshid,et al.  On the Effectiveness of Information Retrieval Based Bug Localization for C Programs , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[75]  Václav Rajlich,et al.  Variable granularity for improving precision of impact analysis , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[76]  Bogdan Dit,et al.  Integrating information retrieval, execution and link analysis algorithms to improve feature location in software , 2012, Empirical Software Engineering.

[77]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[78]  Zhenchang Xing,et al.  Concern Localization using Information Retrieval: An Empirical Study on Linux Kernel , 2011, 2011 18th Working Conference on Reverse Engineering.

[79]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.

[80]  Andreas Zeller,et al.  Where Should We Fix This Bug? A Two-Phase Recommendation Model , 2013, IEEE Transactions on Software Engineering.

[81]  John Anvik,et al.  A noun-based approach to feature location using time-aware term-weighting , 2014, Inf. Softw. Technol..

[82]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[83]  Bin Li,et al.  Exploring topic models in software engineering data analysis: A survey , 2016, 2016 17th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD).

[84]  Letha H. Etzkorn,et al.  Bug localization using latent Dirichlet allocation , 2010, Inf. Softw. Technol..

[85]  Barbara A. Kitchenham,et al.  Combining empirical results in software engineering , 1998, Inf. Softw. Technol..

[86]  Dietmar Pfahl,et al.  Reporting guidelines for controlled experiments in software engineering , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[87]  Thomas Zimmermann,et al.  Extraction of bug localization benchmarks from history , 2007, ASE.

[88]  David Lo,et al.  Compositional Vector Space Models for Improved Bug Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[89]  Janice Singer,et al.  Guide to Advanced Empirical Software Engineering , 2007 .

[90]  Raúl A. Santelices,et al.  Method-level program dependence abstraction and its application to impact analysis , 2016, J. Syst. Softw..

[91]  Gabriele Bavota,et al.  Predicting Query Quality for Applications of Text Retrieval to Software Engineering Tasks , 2017, ACM Trans. Softw. Eng. Methodol..

[92]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[93]  Gerald C. Gannod,et al.  Recovering Concepts from Source Code with Automated Concept Identification , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[94]  Natalia Juristo Juzgado,et al.  Understanding replication of experiments in software engineering: A classification , 2014, Inf. Softw. Technol..

[95]  Arie van Deursen,et al.  A Systematic Survey of Program Comprehension through Dynamic Analysis , 2008, IEEE Transactions on Software Engineering.

[96]  Gabriele Bavota,et al.  Using code ownership to improve IR-based Traceability Link Recovery , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[97]  Zhenchang Xing,et al.  Feature Location in a Collection of Product Variants , 2012, 2012 19th Working Conference on Reverse Engineering.

[98]  Norman Wilde,et al.  Industrial tools for the feature location problem: an exploratory study , 2006, J. Softw. Maintenance Res. Pract..

[99]  Nicholas A. Kraft,et al.  Structural information based term weighting in text retrieval for feature location , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[100]  Denys Poshyvanyk,et al.  Using structural and textual information to capture feature coupling in object-oriented software , 2011, Empirical Software Engineering.

[101]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[102]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[103]  Denys Poshyvanyk,et al.  3D visualization for concept location in source code , 2006, ICSE '06.

[104]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[105]  Andy Zaidman,et al.  Horizontal traceability for just‐in‐time requirements: the case for open source feature requests , 2014, J. Softw. Evol. Process..

[106]  Lionel C. Briand,et al.  A Systematic Review of the Application and Empirical Investigation of Search-Based Test Case Generation , 2010, IEEE Transactions on Software Engineering.

[107]  Jonathan I. Maletic,et al.  Exploration, Analysis, and Manipulation of  Source Code Using srcML , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[108]  Ahmed E. Hassan,et al.  The Impact of Classifier Configuration and Classifier Combination on Bug Localization , 2013, IEEE Transactions on Software Engineering.

[109]  Michael English,et al.  A historical, textual analysis approach to feature location , 2017, Inf. Softw. Technol..

[110]  Avinash C. Kak,et al.  Retrieval from software libraries for bug localization: a comparative study of generic and composite text models , 2011, MSR '11.

[111]  Hareton K. N. Leung,et al.  A survey of code‐based change impact analysis techniques , 2013, Softw. Test. Verification Reliab..