An empirical study on the interplay between semantic coupling and co-change of software classes

The evolution of software systems is an inevitable process which has to be managed effectively to enhance software quality. Change impact analysis (CIA) is a technique that identifies impact sets, i.e., the set of classes that require correction as a result of a change made to a class or artefact. These sets can also be considered as ripple effects and typically non-local: changes propagate to different parts of a system. Two classes are considered logically coupled if they have co-changed in the past; past research has shown that the precision of CIA techniques increases if logical and semantic coupling (i.e., the extent to which the lexical content of two classes is related) are both considered. However, the relationship between semantic and logical coupling of software artefacts has not been extensively studied and no dependencies established between these two types of coupling. Are two often co-changed artefacts also strongly connected from a semantic point of view? Are two semantically similar artefacts bound to co-change in the future? Answering those questions would help increase the precision of CIA. It would also help software maintainers to focus on a smaller subset of artefacts more likely to co-evolve in the future. This study investigated the relationship between semantic and logical coupling. Using Chi-squared statistical tests, we identified similarities in semantic coupling using class corpora and class identifiers. We then computed Spearman's rank correlation between semantic and logical coupling metrics for class pairs to detect whether semantic and logical relationships co-varied in OO software. Finally, we investigated the overlap between semantic and logical relationships by identifying the proportion of classes linked through both coupling types. Our empirical study and results were based on seventy-nine open-source software projects. Results showed that: (a) measuring the semantic similarity of classes by using their identifiers is computationally efficient; (b) using identifier-based coupling can be used interchangeably with semantic similarity based on their corpora, albeit not always; (c) no correlation between the strengths of semantic and change coupling was found. Finally, (d) a directional relationship between the two was identified; 70% of semantic dependencies are linked through change coupling but not vice versa. Based on our findings, we conclude that identifying more efficient methods of semantic coupling computation as well as a directional relationship between semantic and change dependencies could help to improve CIA methods that integrate semantic coupling information. This may also help to reveal implicit dependencies not captured by static source code analysis.

[1]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[2]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[3]  Brad Verhulst,et al.  Correlation not causation: the relationship between personality traits and political ideologies. , 2012, American journal of political science.

[4]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[5]  Václav Rajlich,et al.  Hidden dependencies in program comprehension and change propagation , 2001, Proceedings 9th International Workshop on Program Comprehension. IWPC 2001.

[6]  Marco Aurélio Gerosa,et al.  On the Interplay between Structural and Logical Dependencies in Open-Source Software , 2011, 2011 25th Brazilian Symposium on Software Engineering.

[7]  Sebastian Spaeth,et al.  Code Reuse in Open Source Software , 2008, Manag. Sci..

[8]  Denys Poshyvanyk,et al.  Combining Conceptual and Domain-Based Couplings to Detect Database and Code Dependencies , 2012, 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.

[9]  Denys Poshyvanyk,et al.  Integrating conceptual and logical couplings for change impact analysis in software , 2013, Empirical Software Engineering.

[10]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[11]  Liguo Yu,et al.  Understanding component co-evolution with a study on Linux , 2007, Empirical Software Engineering.

[12]  Ruhi Sarikaya,et al.  Rapid language model development using external resources for new spoken dialog domains , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[14]  Kurt Hornik,et al.  Building on the Arules Infrastructure for Analyzing Transaction Data with R , 2006, GfKl.

[15]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[16]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[17]  Houari A. Sahraoui,et al.  Learning dependency-based change impact predictors using independent change histories , 2015, Inf. Softw. Technol..

[18]  Barbara A. Kitchenham,et al.  Coupling measures and change ripples in C++ application software , 2000, J. Syst. Softw..

[19]  Thomas Wieland,et al.  Evaluation criteria for free/open source software products based on project analysis , 2006, Softw. Process. Improv. Pract..

[20]  Michael L. Barnett,et al.  Beyond Dichotomy: The Curvilinear Relationship between Social Responsibility and Financial Performance , 2006 .

[21]  Chew Lim Tan,et al.  Text Retrieval from Document Images based on N-Gram Algorithm , 2000, PRICAI Workshop on Text and Web Mining.

[22]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[23]  Christoph Treude,et al.  Using contextual information to predict co-changes , 2017, J. Syst. Softw..

[24]  Marco Aurélio Gerosa,et al.  IVAR : A Conceptual Framework for Dependency Management , 2013 .

[25]  Gabriele Bavota,et al.  The role of artefact corpus in LSI-based traceability recovery , 2013, 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE).

[26]  Marco Aurélio Gerosa,et al.  Experience report: How do structural dependencies influence change propagation? An empirical study , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[27]  Kurt Hornik,et al.  Introduction to arules — Mining Association Rules and Frequent Item Sets , 2006 .

[28]  Sarita Singh Bhadauria,et al.  How to Realization Architectural testing model using Measurement Metrics , 2008 .

[29]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[30]  Harald C. Gall,et al.  Fine-grained analysis of change couplings , 2005, Fifth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'05).

[31]  Michele Lanza,et al.  On the Relationship Between Change Coupling and Software Defects , 2009, 2009 16th Working Conference on Reverse Engineering.

[32]  Franck Xia,et al.  Module coupling: a design metric , 1996, Proceedings 1996 Asia-Pacific Software Engineering Conference.

[33]  Denys Poshyvanyk,et al.  Blending Conceptual and Evolutionary Couplings to Support Change Impact Analysis in Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[34]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[35]  Ron S. Kenett,et al.  Relative Linkage Disequilibrium: A New Measure for Association Rules , 2008, ICDM.

[36]  Dhavalkumar Thakker,et al.  Capturing the semantics of individual viewpoints on social signals in interpersonal communication , 2012 .

[37]  Andrea Capiluppi,et al.  Semantic Coupling Between Classes: Corpora or Identifiers? , 2016, ESEM.

[38]  Judith A. McLaughlin Understanding Statistics in the Behavioral Sciences , 2002 .

[39]  Ioannis Stamelos,et al.  Survival analysis on the duration of open source projects , 2010, Inf. Softw. Technol..

[40]  Avshalom Caspi,et al.  RECONSIDERING THE RELATIONSHIP BETWEEN SES AND DELINQUENCY: CAUSATION BUT NOT CORRELATION* , 1999 .

[41]  Denys Poshyvanyk,et al.  Using structural and textual information to capture feature coupling in object-oriented software , 2011, Empirical Software Engineering.

[42]  Michele Lanza,et al.  The evolution radar: visualizing integrated logical coupling information , 2006, MSR '06.

[43]  Frank Schweitzer,et al.  The Link between Dependency and Cochange: Empirical Evidence , 2012, IEEE Transactions on Software Engineering.

[44]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[45]  Andrian Marcus,et al.  Recovery of Traceability Links between Software Documentation and Source Code , 2005, Int. J. Softw. Eng. Knowl. Eng..

[46]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[47]  Andreas Zeller,et al.  How history justifies system architecture (or not) , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[48]  Gabriele Bavota,et al.  Methodbook: Recommending Move Method Refactorings via Relational Topic Models , 2014, IEEE Transactions on Software Engineering.

[49]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[50]  Denys Poshyvanyk,et al.  The conceptual cohesion of classes , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[51]  Scott E. Maxwell,et al.  Correlation between student satisfaction and grades: A case of mistaken causation? , 1980 .

[52]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[53]  Václav Rajlich,et al.  Variable granularity for improving precision of impact analysis , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[54]  Bixin Li,et al.  Static change impact analysis techniques: A comparative study , 2015, J. Syst. Softw..

[55]  Gabriele Bavota,et al.  Improving software modularization via automated analysis of latent topics and dependencies , 2014, TSEM.

[56]  Radu Vanciu,et al.  Hidden dependencies in software systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[57]  Austen Rainer,et al.  Evaluating the Quality and Quantity of Data on Open Source Software Projects , 2005 .

[58]  Tibor Gyimóthy,et al.  New Conceptual Coupling and Cohesion Metrics for Object-Oriented Systems , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[59]  Carlos Noguera,et al.  Explaining Why Methods Change Together , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[60]  Lionel C. Briand,et al.  Using coupling measurement for impact analysis in object-oriented systems , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[61]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[62]  I. S. Wiese,et al.  Do historical metrics and developers communication aid to predict change couplings? , 2015, IEEE Latin America Transactions.

[63]  Gabriele Bavota,et al.  SCOTCH: Test-to-code traceability using slicing and conceptual coupling , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[64]  Gabriele Bavota,et al.  An empirical study on the developers' perception of software coupling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[65]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[66]  Marco Aurélio Gerosa,et al.  An Empirical Study of the Relation Between Strong Change Coupling and Defects Using History and Social Metrics in the Apache Aries Project , 2015, OSS.

[67]  Gabriele Bavota,et al.  A two-step technique for extract class refactoring , 2010, ASE.

[68]  Prashant Palvia,et al.  Factors affecting the success of Open Source Software , 2012, J. Syst. Softw..

[69]  Tibor Gyimóthy,et al.  Using information retrieval based coupling measures for impact analysis , 2009, Empirical Software Engineering.

[70]  Yonggang Zhang,et al.  Text mining and software engineering: an integrated source code and document analysis approach , 2008, IET Softw..

[71]  Kevin Crowston,et al.  COORDINATION OF FREE/LIBRE OPEN SOURCE , 2005 .

[72]  Jonathan I. Maletic,et al.  A survey and taxonomy of approaches for mining software repositories in the context of software evolution , 2007, J. Softw. Maintenance Res. Pract..