A Structure-Driven Method for Information Retrieval-Based Software Change Impact Analysis

An important application of information retrieval technology is software change impact analysis. Existing information retrieval-based change impact analysis methods select a single method to transform the source code corpus into vectors in a process known as indexing. The single method is chosen from two primary methods, known as the bag-of-words and word embedding models, each having their specific advantages and disadvantages. The bag-of-words model records every word in the source code but ignores contextual information in the corpus. The word embedding model records the contextual information but loses detail for individual words. To address this problem, we propose a structure-driven method for information retrieval-based change impact analysis (named SDM-CIA). SDM-CIA integrates the bag-of-words and word embedding models based on the software’s structure. Our experiments using a standard benchmark shows that when compared with the existing methods, SDM-CIA improves on precision performance, recall performance, F-score performance, and MRR performance by an average of 3.65%, 3.82%, 3.6%, and 10.28%, respectively. Our experiments confirm the effectiveness of SDM-CIA.

[1]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[2]  Rong Jin,et al.  Understanding bag-of-words model: a statistical framework , 2010, Int. J. Mach. Learn. Cybern..

[3]  Letha H. Etzkorn,et al.  A Synergistic Approach to Program Comprehension , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[4]  WangWei,et al.  A Hybrid Approach for Ripple Effect Analysis of Software Evolution Activities , 2016 .

[5]  Andrea De Lucia,et al.  Parameterizing and Assembling IR-Based Solutions for SE Tasks Using Genetic Algorithms , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[6]  Martin P. Robillard,et al.  Topology analysis of software dependencies , 2008, TSEM.

[7]  J.A. Gomez,et al.  Locating user functionality in old code , 1992, Proceedings Conference on Software Maintenance 1992.

[8]  Denys Poshyvanyk,et al.  FLAT3: feature location and textual tracing tool , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[9]  Durga Prasad Mohapatra,et al.  Computing Dynamic Slices of Feature--Oriented Programs Using Execution Trace File , 2017, SOEN.

[10]  Letha H. Etzkorn,et al.  Configuring latent Dirichlet allocation based feature location , 2014, Empirical Software Engineering.

[11]  Bogdan Dit,et al.  Integrating information retrieval, execution and link analysis algorithms to improve feature location in software , 2012, Empirical Software Engineering.

[12]  Radu Vanciu,et al.  Partial Domain Comprehension in Software Evolution and Maintenance , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[13]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[14]  Giuseppe Scanniello,et al.  Link analysis algorithms for static concept location: an empirical assessment , 2014, Empirical Software Engineering.

[15]  Andrian Marcus,et al.  An information retrieval approach to concept location in source code , 2004, 11th Working Conference on Reverse Engineering.

[16]  Zhaohong Deng,et al.  A Novel Text Clustering Algorithm Based on Feature Weighting Distance and Soft Subspace Learning , 2012 .

[17]  Per Runeson,et al.  Supporting Change Impact Analysis Using a Recommendation System: An Industrial Case Study in a Safety-Critical Context , 2017, IEEE Transactions on Software Engineering.

[18]  Paul Grünbacher,et al.  Modular Change Impact Analysis for Configurable Software , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[19]  Giuliano Antoniol,et al.  Analyzing the Evolution of the Source Code Vocabulary , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[20]  Michael English,et al.  A historical, textual analysis approach to feature location , 2017, Inf. Softw. Technol..

[21]  Denys Poshyvanyk,et al.  Amalgamating source code authors, maintainers, and change proneness to triage change requests , 2014, ICPC 2014.

[22]  Mario Linares Vásquez,et al.  On automatically detecting similar Android apps , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[23]  Collin McMillan,et al.  Do Programmers do Change Impact Analysis in Debugging? , 2016, Empirical Software Engineering.

[24]  Nicholas A. Kraft,et al.  Exploring the use of deep learning for feature location , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  Cheng-Yuan Liou,et al.  Modeling word perception using the Elman network , 2008, Neurocomputing.

[27]  Jaime Font,et al.  Improving feature location by transforming the query from natural language into requirements , 2016, SPLC.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  Jonathan I. Maletic,et al.  Heuristic-based part-of-speech tagging of source code identifiers and comments , 2015, 2015 IEEE 5th Workshop on Mining Unstructured Data (MUD).

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[32]  Shang-Pin Ma,et al.  Retrieval of Web Service Components using UML Modeling and Term Expansion , 2017, J. Inf. Sci. Eng..