An empirical study on the interplay between semantic coupling and co-change of software classes

Software systems continuously evolve to accommodate new features and interoperability relationships between artifacts point to increasingly relevant software change impacts. During maintenance, developers must ensure that related entities are updated to be consistent with these changes. Studies in the static change impact analysis domain have identified that a combination of source code and lexical information outperforms using each one when adopted independently. However, the extraction of lexical information and the measure of how loosely or closely related two software artifacts are, considering the semantic information embedded in their comments and identifiers has been carried out using somewhat complex information retrieval (IR) techniques. The interplay between software semantic and change relationship strengths has also not been extensively studied. This work aims to fill both gaps by comparing the effectiveness of measuring semantic coupling of OO software classes using (i) simple identifier based techniques and (ii) the word corpora of the entire classes in a software system. Afterwards, we empirically investigate the interplay between semantic and change coupling. The empirical results show that: (1) identifier based methods have more computational efficiency but cannot always be used interchangeably with corpora-based methods of computing semantic coupling of classes and (2) there is no correlation between semantic and change coupling. Furthermore we found that (3) there is a directional relationship between the two, as over 70% of the semantic dependencies are also linked by change coupling but not vice versa.

[1]  David Kauchak,et al.  Learning to Simplify Sentences Using Wikipedia , 2011, Monolingual@ACL.

[2]  Sebastian Spaeth,et al.  Code Reuse in Open Source Software , 2008, Manag. Sci..

[3]  Václav Rajlich,et al.  Hidden dependencies in program comprehension and change propagation , 2001, Proceedings 9th International Workshop on Program Comprehension. IWPC 2001.

[4]  Houari A. Sahraoui,et al.  Learning dependency-based change impact predictors using independent change histories , 2015, Inf. Softw. Technol..

[5]  Eirini Kalliamvakou,et al.  An in-depth study of the promises and perils of mining GitHub , 2016, Empirical Software Engineering.

[6]  Bixin Li,et al.  Static change impact analysis techniques: A comparative study , 2015, J. Syst. Softw..

[7]  Lawrence J. Strieker Understanding Statistics in the Behavioral Sciences (3rd ed.). , 1991 .

[8]  Gabriele Bavota,et al.  Improving software modularization via automated analysis of latent topics and dependencies , 2014, TSEM.

[9]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[10]  Radu Vanciu,et al.  Hidden dependencies in software systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[11]  Marco Aurélio Gerosa,et al.  Experience report: How do structural dependencies influence change propagation? An empirical study , 2015, 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE).

[12]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[13]  Marco Aurélio Gerosa,et al.  IVAR : A Conceptual Framework for Dependency Management , 2013 .

[14]  Elisa Bertino,et al.  Hiding Association Rules by Using Confidence and Support , 2001, Information Hiding.

[15]  R. Stake The art of case study research , 1995 .

[16]  Dhavalkumar Thakker,et al.  Capturing the semantics of individual viewpoints on social signals in interpersonal communication , 2012 .

[17]  Andrea Capiluppi,et al.  Semantic Coupling Between Classes: Corpora or Identifiers? , 2016, ESEM.

[18]  Jonathan I. Maletic,et al.  A survey and taxonomy of approaches for mining software repositories in the context of software evolution , 2007, J. Softw. Maintenance Res. Pract..

[19]  Marco Aurélio Gerosa,et al.  On the Interplay between Structural and Logical Dependencies in Open-Source Software , 2011, 2011 25th Brazilian Symposium on Software Engineering.

[20]  Denys Poshyvanyk,et al.  Blending Conceptual and Evolutionary Couplings to Support Change Impact Analysis in Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[21]  Arie van Deursen,et al.  An exploratory study of the pull-based software development model , 2014, ICSE.

[22]  Ron S. Kenett,et al.  Relative Linkage Disequilibrium: A New Measure for Association Rules , 2008, ICDM.

[23]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[24]  Denys Poshyvanyk,et al.  Combining Conceptual and Domain-Based Couplings to Detect Database and Code Dependencies , 2012, 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation.

[25]  Denys Poshyvanyk,et al.  Integrating conceptual and logical couplings for change impact analysis in software , 2013, Empirical Software Engineering.

[26]  Gabriele Bavota,et al.  Methodbook: Recommending Move Method Refactorings via Relational Topic Models , 2014, IEEE Transactions on Software Engineering.

[27]  Scott E. Maxwell,et al.  Correlation between student satisfaction and grades: A case of mistaken causation? , 1980 .

[28]  Yonggang Zhang,et al.  Text mining and software engineering: an integrated source code and document analysis approach , 2008, IET Softw..

[29]  Christoph Treude,et al.  Using contextual information to predict co-changes , 2017, J. Syst. Softw..

[30]  James Mayfield,et al.  Character N-Gram Tokenization for European Language Text Retrieval , 2004, Information Retrieval.

[31]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[32]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[33]  Denys Poshyvanyk,et al.  The conceptual cohesion of classes , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[34]  Liguo Yu,et al.  Understanding component co-evolution with a study on Linux , 2007, Empirical Software Engineering.

[35]  Kevin Crowston,et al.  The social structure of free and open source software development , 2005, First Monday.

[36]  Ruhi Sarikaya,et al.  Rapid language model development using external resources for new spoken dialog domains , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[37]  R. Pagano Understanding Statistics in the Behavioral Sciences , 1981 .

[38]  Avshalom Caspi,et al.  RECONSIDERING THE RELATIONSHIP BETWEEN SES AND DELINQUENCY: CAUSATION BUT NOT CORRELATION* , 1999 .

[39]  Kurt Hornik,et al.  Building on the Arules Infrastructure for Analyzing Transaction Data with R , 2006, GfKl.

[40]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[41]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[42]  Kurt Hornik,et al.  Introduction to arules — Mining Association Rules and Frequent Item Sets , 2006 .

[43]  Kevin Crowston,et al.  Coordination of Free/Libre Open Source Software Development , 2005, ICIS.

[44]  Kevin Crowston,et al.  Effective work practices for software engineering: free/libre open source software development , 2004, WISER '04.

[45]  Andrian Marcus,et al.  Recovery of Traceability Links between Software Documentation and Source Code , 2005, Int. J. Softw. Eng. Knowl. Eng..

[46]  Lionel C. Briand,et al.  Using coupling measurement for impact analysis in object-oriented systems , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[47]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[48]  I. S. Wiese,et al.  Do historical metrics and developers communication aid to predict change couplings? , 2015, IEEE Latin America Transactions.

[49]  Gabriele Bavota,et al.  SCOTCH: Test-to-code traceability using slicing and conceptual coupling , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[50]  Václav Rajlich,et al.  Variable granularity for improving precision of impact analysis , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[51]  Austen Rainer,et al.  Evaluating the Quality and Quantity of Data on Open Source Software Projects , 2005 .

[52]  Harald C. Gall,et al.  Fine-grained analysis of change couplings , 2005, Fifth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'05).

[53]  Michele Lanza,et al.  On the Relationship Between Change Coupling and Software Defects , 2009, 2009 16th Working Conference on Reverse Engineering.

[54]  Gabriele Bavota,et al.  An empirical study on the developers' perception of software coupling , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[55]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[56]  Tibor Gyimóthy,et al.  Using information retrieval based coupling measures for impact analysis , 2009, Empirical Software Engineering.

[57]  Marco Aurélio Gerosa,et al.  An Empirical Study of the Relation Between Strong Change Coupling and Defects Using History and Social Metrics in the Apache Aries Project , 2015, OSS.

[58]  Gabriele Bavota,et al.  A two-step technique for extract class refactoring , 2010, ASE.

[59]  Franck Xia,et al.  Module coupling: a design metric , 1996, Proceedings 1996 Asia-Pacific Software Engineering Conference.

[60]  Ioannis Stamelos,et al.  Survival analysis on the duration of open source projects , 2010, Inf. Softw. Technol..

[61]  Prashant Palvia,et al.  Factors affecting the success of Open Source Software , 2012, J. Syst. Softw..

[62]  Tibor Gyimóthy,et al.  New Conceptual Coupling and Cohesion Metrics for Object-Oriented Systems , 2010, 2010 10th IEEE Working Conference on Source Code Analysis and Manipulation.

[63]  Carlos Noguera,et al.  Explaining Why Methods Change Together , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[64]  Andreas Zeller,et al.  How history justifies system architecture (or not) , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[65]  Sarita Singh Bhadauria,et al.  How to Realization Architectural testing model using Measurement Metrics , 2008 .

[66]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[67]  Barbara A. Kitchenham,et al.  Coupling measures and change ripples in C++ application software , 2000, J. Syst. Softw..

[68]  Thomas Wieland,et al.  Evaluation criteria for free/open source software products based on project analysis , 2006, Softw. Process. Improv. Pract..

[69]  Frank Schweitzer,et al.  The Link between Dependency and Cochange: Empirical Evidence , 2012, IEEE Transactions on Software Engineering.

[70]  Brad Verhulst,et al.  Correlation not causation: the relationship between personality traits and political ideologies. , 2012, American journal of political science.

[71]  Denys Poshyvanyk,et al.  Using structural and textual information to capture feature coupling in object-oriented software , 2011, Empirical Software Engineering.

[72]  Harald C. Gall,et al.  Do Code and Comments Co-Evolve? On the Relation between Source Code and Comment Changes , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[73]  Michael L. Barnett,et al.  Beyond Dichotomy: The Curvilinear Relationship between Social Responsibility and Financial Performance , 2006 .

[74]  D. A. Kenny,et al.  Correlation and causality , 1979 .

[75]  Chew Lim Tan,et al.  Text Retrieval from Document Images based on N-Gram Algorithm , 2000, PRICAI Workshop on Text and Web Mining.

[76]  Gabriele Bavota,et al.  The role of artefact corpus in LSI-based traceability recovery , 2013, 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE).

[77]  Michele Lanza,et al.  The evolution radar: visualizing integrated logical coupling information , 2006, MSR '06.

[78]  Andy Field,et al.  Discovering statistics using SPSS: and sex and drugs and rock 'n' roll, 3rd Edition , 2009 .