Integrating conceptual and logical couplings for change impact analysis in software

The paper presents an approach that combines conceptual and evolutionary techniques to support change impact analysis in source code. Conceptual couplings capture the extent to which domain concepts and software artifacts are related to each other. This information is derived using Information Retrieval based analysis of textual software artifacts that are found in a single version of software (e.g., comments and identifiers in a single snapshot of source code). Evolutionary couplings capture the extent to which software artifacts were co-changed. This information is derived from analyzing patterns, relationships, and relevant information of source code changes mined from multiple versions in software repositories. The premise is that such combined methods provide improvements to the accuracy of impact sets compared to the two individual approaches. A rigorous empirical assessment on the changes of the open source systems Apache httpd, ArgoUML, iBatis, KOffice, and jEdit is also reported. The impact sets are evaluated at the file and method levels of granularity for all the software systems considered in the empirical evaluation. The results show that a combination of conceptual and evolutionary techniques, across several cut-off points and periods of history, provides statistically significant improvements in accuracy over either of the two techniques used independently. Improvements in F-measure values of up to 14% (from 3% to 17%) over the conceptual technique in ArgoUML at the method granularity, and up to 21% over the evolutionary technique in iBatis (from 9% to 30%) at the file granularity were reported.

[1]  Jonathan I. Maletic,et al.  Supporting source code difference analysis , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[2]  P. Malik On equivalence. , 2003, The Canadian journal of cardiology.

[3]  Markus Pizka,et al.  Concise and Consistent Naming , 2005, IWPC.

[4]  Denys Poshyvanyk,et al.  Combining Formal Concept Analysis with Information Retrieval for Concept Location in Source Code , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[5]  Lionel C. Briand,et al.  Automating impact analysis and regression test selection based on UML designs , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[6]  Jonathan I. Maletic,et al.  Mining sequences of changed-files from version histories , 2006, MSR '06.

[7]  John J. Marciniak,et al.  Encyclopedia of Software Engineering , 1994, Encyclopedia of Software Engineering.

[8]  Massimiliano Di Penta,et al.  An approach to classify software maintenance requests , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[9]  Andrea De Lucia,et al.  On the Equivalence of Information Retrieval Methods for Automated Traceability Link Recovery , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[10]  Giuliano Antoniol,et al.  Analyzing the Evolution of the Source Code Vocabulary , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[11]  Martin P. Robillard,et al.  Automatic generation of suggestions for program investigation , 2005, ESEC/FSE-13.

[12]  Lionel C. Briand,et al.  Using coupling measurement for impact analysis in object-oriented systems , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[13]  Alfred V. Aho,et al.  CERBERUS: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[14]  Gerardo Canfora,et al.  Using multivariate time series and association rules to detect logical change coupling: An empirical study , 2010, 2010 IEEE International Conference on Software Maintenance.

[15]  Victoria Interrante,et al.  User Studies: Why, How, and When? , 2003, IEEE Computer Graphics and Applications.

[16]  Denys Poshyvanyk,et al.  Using structural and textual information to capture feature coupling in object-oriented software , 2011, Empirical Software Engineering.

[17]  Jonathan I. Maletic,et al.  Mining evolutionary dependencies from web-localization repositories , 2007, J. Softw. Maintenance Res. Pract..

[18]  David Leon,et al.  Dex: a semantic-graph differencing tool for studying changes in large code bases , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[19]  Václav Rajlich,et al.  RIPPLES: tool for change in legacy software , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[20]  Giuliano Antoniol,et al.  Identifying the starting impact set of a maintenance request: a case study , 2000, Proceedings of the Fourth European Conference on Software Maintenance and Reengineering.

[21]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[22]  Emily Hill,et al.  Exploring the neighborhood with dora to expedite software maintenance , 2007, ASE '07.

[23]  Václav Rajlich,et al.  Hidden dependencies in program comprehension and change propagation , 2001, Proceedings 9th International Workshop on Program Comprehension. IWPC 2001.

[24]  Yann-Gaël Guéhéneuc,et al.  Physical and conceptual identifier dispersion: Measures and relation to fault proneness , 2010, 2010 IEEE International Conference on Software Maintenance.

[25]  Markus Pizka,et al.  Concise and consistent naming , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[26]  David W. Binkley,et al.  To camelcase or under_score , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[27]  Paolo Tonella,et al.  Using a Concept Lattice of Decomposition Slices for Program Understanding and Impact Analysis , 2003, IEEE Trans. Software Eng..

[28]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[29]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.

[30]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[31]  Jonathan I. Maletic,et al.  Mining software repositories for traceability links , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[32]  Tom Mens,et al.  A State-of-the-Art Survey on Software Merging , 2002, IEEE Trans. Software Eng..

[33]  Yann-Gaël Guéhéneuc,et al.  Mining the Lexicon Used by Programmers during Sofware Evolution , 2007, 2007 IEEE International Conference on Software Maintenance.

[34]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[35]  Gregg Rothermel,et al.  An empirical comparison of dynamic impact analysis algorithms , 2004, Proceedings. 26th International Conference on Software Engineering.

[36]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[37]  Václav Rajlich,et al.  A model for change propagation based on graph rewriting , 1997, 1997 Proceedings International Conference on Software Maintenance.

[38]  Paolo Tonella,et al.  Nomen est omen: analyzing the language of function identifiers , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[39]  Jonathan I. Maletic,et al.  What's a Typical Commit? A Characterization of Open Source Software Repositories , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[40]  Bogdan Dit,et al.  Feature location in source code: a taxonomy and survey , 2013, J. Softw. Evol. Process..

[41]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[42]  Dawn J Lawrie,et al.  Information Retrieval Applications in Software Development , 2010 .

[43]  Denys Poshyvanyk,et al.  Using Relational Topic Models to capture coupling among classes in object-oriented software systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[44]  Bogdan Dit,et al.  Using Data Fusion and Web Mining to Support Feature Location in Software , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[45]  Martin P. Robillard,et al.  Topology analysis of software dependencies , 2008, TSEM.

[46]  Jane Cleland-Huang,et al.  A machine learning approach for tracing regulatory codes to product specific requirements , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[47]  Denys Poshyvanyk,et al.  SE2 model to support software evolution , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[48]  Genny Tortora,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007, TSEM.

[49]  Frank Tip,et al.  Chianti: a tool for change impact analysis of java programs , 2004, OOPSLA.

[50]  Denys Poshyvanyk,et al.  An exploratory study on assessing feature location techniques , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[51]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[52]  Malcolm Munro,et al.  The impact analysis task in software maintenance: a model and a case study , 1994, Proceedings 1994 International Conference on Software Maintenance.

[53]  Jonathan I. Maletic,et al.  Mining evolutionary dependencies from web-localization repositories: Special Issue Articles , 2007 .

[54]  Denys Poshyvanyk,et al.  Journal of Software Maintenance and Evolution: Research and Practice Assigning Change Requests to Software Developers , 2022 .

[55]  Harald C. Gall,et al.  Fine-grained analysis of change couplings , 2005, Fifth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'05).

[56]  Dawn J Lawrie,et al.  Information Retrieval Applications in Software Maintenance and Evolution , 2010 .

[57]  Barbara A. Kitchenham,et al.  Coupling measures and change ripples in C++ application software , 2000, J. Syst. Softw..

[58]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[59]  Tao Xie,et al.  An approach to detecting duplicate bug reports using natural language and execution information , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[60]  Denys Poshyvanyk,et al.  Blending Conceptual and Evolutionary Couplings to Support Change Impact Analysis in Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[61]  Robert S. Arnold,et al.  Software Change Impact Analysis , 1996 .

[62]  Keith H. Bennett,et al.  A Staged Model for the Software Life Cycle , 2000, Computer.

[63]  Mark Harman,et al.  5 th IEEE International Workshop on Program Comprehension (IWPC'97) , 1997 .

[64]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[65]  Lionel C. Briand,et al.  A Unified Framework for Coupling Measurement in Object-Oriented Systems , 1999, IEEE Trans. Software Eng..

[66]  T RajlichVáclav,et al.  A Staged Model for the Software Life Cycle , 2000 .

[67]  Denys Poshyvanyk,et al.  Concept location using formal concept analysis and information retrieval , 2012, TSEM.

[68]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[69]  Václav Rajlich,et al.  Variable granularity for improving precision of impact analysis , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[70]  Ioana M. Boier-Martin,et al.  Visualization Viewpoints , 2000 .

[71]  Gerardo Canfora,et al.  Impact analysis by mining software and change request repositories , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[72]  Meir M. Lehman,et al.  Program evolution: processes of software change , 1985 .

[73]  Bogdan Dit,et al.  Integrated impact analysis for managing software changes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[74]  Bogdan Dit,et al.  Integrating information retrieval, execution and link analysis algorithms to improve feature location in software , 2012, Empirical Software Engineering.

[75]  Andrian Marcus,et al.  On the Use of Domain Terms in Source Code , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[76]  Leon Moonen Lightweight impact analysis using island grammars , 2002, Proceedings 10th International Workshop on Program Comprehension.

[77]  Premkumar T. Devanbu,et al.  Recommending random walks , 2007, ESEC-FSE '07.

[78]  Barbara G. Ryder,et al.  Points-to analysis for Java using annotated constraints , 2001, OOPSLA '01.

[79]  Gregg Rothermel,et al.  Whole program path-based dynamic impact analysis , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[80]  Jonathan I. Maletic,et al.  An XML-Based Lightweight C++ Fact Extractor , 2003, IWPC.

[81]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[82]  Tibor Gyimóthy,et al.  Using information retrieval based coupling measures for impact analysis , 2009, Empirical Software Engineering.

[83]  LuciaAndrea De,et al.  Recovering traceability links in software artifact management systems using information retrieval methods , 2007 .

[84]  Václav Rajlich,et al.  Case study of feature location using dependence graph , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[85]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[86]  Andrea De Lucia,et al.  On integrating orthogonal information retrieval methods to improve traceability recovery , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[87]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[88]  Keith Brian Gallagher,et al.  Using Program Slicing in Software Maintenance , 1991, IEEE Trans. Software Eng..

[89]  Thomas Zimmermann,et al.  Improving bug triage with bug tossing graphs , 2009, ESEC/FSE '09.

[90]  Denys Poshyvanyk,et al.  Who can help me with this change request? , 2009, 2009 IEEE 17th International Conference on Program Comprehension.