A historical, textual analysis approach to feature location

Abstract Context Feature location is the task of finding the source code that implements specific functionality in software systems. A common approach is to leverage textual information in source code against a query, using Information Retrieval (IR) techniques. To address the paucity of meaningful terms in source code, alternative, relevant source-code descriptions, like change-sets could be leveraged for these IR techniques. However, the extent to which these descriptions are useful has not been thoroughly studied. Objective This work rigorously characterizes the efficacy of source-code lexical annotation by change-sets (ACIR), in terms of its best-performing configuration. Method A tool, implementing ACIR, was used to study different configurations of the approach and to compare them to a baseline approach (thus allowing comparison against other techniques going forward). This large-scale evaluation employs eight subject systems and 600 features. Results It was found that, for ACIR: (1) method level granularity demands less search effort; (2) using more recent change-sets improves effectiveness; (3) aggregation of recent change-sets by change request, decreases effectiveness; (4) naive, text-classification-based filtering of “management” change-sets also decreases the effectiveness. In addition, a strongly pronounced dichotomy of subject systems emerged, where one set recorded better feature location using ACIR and the other recorded better feature location using the baseline approach. Finally, merging ACIR and the baseline approach significantly improved performance over both standalone approaches for all systems. Conclusion The most fundamental finding is the importance of rigorously characterizing proposed feature location techniques, to identify their optimal configurations. The results also suggest it is important to characterize the software systems under study when selecting the appropriate feature location technique. In the past, configuration of the techniques and characterization of subject systems have not been considered first-class entities in research papers, whereas the results presented here suggests these factors can have a big impact.

[1]  Peng Shao,et al.  Feature location by IR modules and call graph , 2009, ACM-SE 47.

[2]  Jonathan I. Maletic,et al.  Improving Feature Location by Enhancing Source Code with Stereotypes , 2013, 2013 IEEE International Conference on Software Maintenance.

[3]  Harald C. Gall,et al.  Evaluating a query framework for software evolution data , 2013, TSEM.

[4]  B. J. Oates,et al.  Researching Information Systems and Computing , 2005 .

[5]  Denys Poshyvanyk,et al.  Concept location using formal concept analysis and information retrieval , 2012, TSEM.

[6]  Emily Hill,et al.  On the use of positional proximity in IR-based feature location , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[7]  Bennet P. Lientz,et al.  Software Maintenance Management: A Study of the Maintenance of Computer Application Software in 487 Data Processing Organizations , 1980 .

[8]  Zhenchang Xing,et al.  Iterative context-aware feature location. , 2011, ICSE 2011.

[9]  Michele Lanza,et al.  On the nature of commits , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering - Workshops.

[10]  Zhenchang Xing,et al.  An exploratory study of feature location process: Distinct phases, recurring patterns, and elementary actions , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[11]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[12]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[13]  Huzefa H. Kagdi,et al.  Impact analysis of change requests on source code based on interaction and commit histories , 2014, MSR 2014.

[14]  Les Hatton,et al.  Does OO Sync with How We Think? , 1998, IEEE Softw..

[15]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[16]  E. B. Swanson,et al.  Software maintenance management , 1980 .

[17]  Paolo Tonella,et al.  The Effect of Lexicon Bad Smells on Concept Location in Source Code , 2011, 2011 IEEE 11th International Working Conference on Source Code Analysis and Manipulation.

[18]  Rainer Koschke,et al.  How do professional developers comprehend software? , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[19]  Marsha Chechik,et al.  A Survey of Feature Location Techniques , 2013, Domain Engineering, Product Lines, Languages, and Conceptual Models.

[20]  Denys Poshyvanyk,et al.  Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[21]  Shinpei Hayashi,et al.  iFL: An interactive environment for understanding feature implementations , 2010, 2010 IEEE International Conference on Software Maintenance.

[22]  Wei Zhao,et al.  SNIAFL: towards a static non-interactive approach to feature location , 2004, Proceedings. 26th International Conference on Software Engineering.

[23]  Bogdan Dit,et al.  A TraceLab-based solution for creating, conducting, and sharing feature location experiments , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[24]  John Anvik,et al.  A noun-based approach to feature location using time-aware term-weighting , 2014, Inf. Softw. Technol..

[25]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[26]  Bogdan Dit,et al.  Integrating information retrieval, execution and link analysis algorithms to improve feature location in software , 2012, Empirical Software Engineering.

[27]  Françoise Détienne,et al.  Software Design — Cognitive Aspects , 2001, Practitioner Series.

[28]  David Lo,et al.  Inferring Links between Concerns and Methods with Multi-abstraction Vector Space Model , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[29]  Yijun Yu,et al.  Iterative context-aware feature location: (NIER track) , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[30]  Qing Zhang,et al.  CVSSearch: searching through source code using CVS comments , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[31]  Thomas Fritz,et al.  A dictionary to translate change tasks to source code , 2014, MSR 2014.

[32]  Gabriele Bavota,et al.  Automatic query performance assessment during the retrieval of software artifacts , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[33]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[34]  Bogdan Dit,et al.  Using Data Fusion and Web Mining to Support Feature Location in Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[35]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36]  Michael English,et al.  Using changeset descriptions as a data source to assist feature location , 2015, 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM).

[37]  K. Popper,et al.  The Logic of Scientific Discovery , 1960 .

[38]  Gerardo Canfora,et al.  Fine grained indexing of software repositories to support impact analysis , 2006, MSR '06.

[39]  Nicholas A. Kraft,et al.  Structural information based term weighting in text retrieval for feature location , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[40]  Christopher Exton,et al.  Assisting Concept Location in Software Comprehension , 2007, PPIG.

[41]  Emily Hill,et al.  Exploring the neighborhood with dora to expedite software maintenance , 2007, ASE '07.

[42]  Jim Buckley,et al.  Expectation-based, inference-based, and bottom-up software comprehension , 2004, J. Softw. Maintenance Res. Pract..

[43]  Paolo Tonella,et al.  Natural Language Parsing of Program Element Names for Concept Extraction , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[44]  Hideaki Hata,et al.  Impact Analysis of Granularity Levels on Feature Location Technique , 2014, APRES.

[45]  Rainer Koschke,et al.  Locating Features in Source Code , 2003, IEEE Trans. Software Eng..

[46]  S. K. Chang,et al.  Handbook of Software Engineering And Knowledge Engineering: Recent Advances , 2005 .

[47]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[48]  Giuseppe Scanniello,et al.  Clustering Support for Static Concept Location in Source Code , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[49]  Natalia Juristo Juzgado,et al.  Replication of Software Engineering Experiments , 2010, LASER Summer School.

[50]  Brad A. Myers,et al.  An Exploratory Study of How Developers Seek, Relate, and Collect Relevant Information during Software Maintenance Tasks , 2006, IEEE Transactions on Software Engineering.

[51]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[52]  Alfred V. Aho,et al.  CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[53]  Chanchal Kumar Roy,et al.  TextRank based search term identification for software change tasks , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[54]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[55]  Michael English,et al.  An empirical analysis of information retrieval based concept location techniques in software comprehension , 2008, Empirical Software Engineering.

[56]  Susan Elliott Sim,et al.  Using transitive changesets to support feature location , 2010, ASE '10.

[57]  Jim Buckley,et al.  Expectation-based, inference-based, and bottom-up software comprehension: Research Articles , 2004 .

[58]  E. Burton Swanson,et al.  The dimensions of maintenance , 1976, ICSE '76.

[59]  Wei Zhao,et al.  SNIAFL: towards a static non-interactive approach to feature location , 2004, ICSE 2004.

[60]  Giuseppe Scanniello,et al.  Link analysis algorithms for static concept location: an empirical assessment , 2014, Empirical Software Engineering.

[61]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[62]  Václav Rajlich,et al.  Concept location using program dependencies and information retrieval (DepIR) , 2013, Inf. Softw. Technol..