On the Conceptual Cohesion of Co-Change Clusters

The analysis of co-change clusters as an alternative software decomposition can provide insights on different perspectives of modularity. But the usual approach using coarse-grained entities does not provide relevant information, like the conceptual cohesion of the modular abstractions that emerge from co-change clusters. This work presents a novel approach to analyze the conceptual cohesion of the source-code associated with co-change clusters of fine-grained entities. We obtain from the change history information found in version control systems. We describe the use of our approach to analyze six well established and currently active open-source projects from different domains and one of the most relevant systems of the Brazilian Government for the financial domain. The results show that co-change clusters offer a new perspective on the code based on groups with high conceptual cohesion between its entities (up to 69% more than the original package decomposition), and, thus, are suited to detect concepts pervaded on codebases, opening new possibilities of comprehension of source-code by means of the concepts embodied in the co-change clusters.

[1]  Bogdan Dit,et al.  Measuring the Semantic Similarity of Comments in Bug Reports , 2008 .

[2]  Dirk Beyer,et al.  Mining Co-Change Clusters from Version Repositories , 2005 .

[3]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[4]  Georgios Gousios,et al.  Untangling fine-grained code changes , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[5]  Gabriele Bavota,et al.  Improving software modularization via automated analysis of latent topics and dependencies , 2014, TSEM.

[6]  Carlos José Pereira de Lucena,et al.  Heuristic expansion of feature mappings in evolving program families , 2014, Softw. Pract. Exp..

[7]  Osamu Mizuno,et al.  Historage: fine-grained version control system for Java , 2011, IWPSE-EVOL '11.

[8]  Spiros Mancoridis,et al.  On the automatic modularization of software systems using the Bunch tool , 2006, IEEE Transactions on Software Engineering.

[9]  Marco Tulio Valente,et al.  Remodularization analysis using semantic clustering , 2014, 2014 Software Evolution Week - IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE).

[10]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[11]  Andreas Zeller,et al.  The impact of tangled code changes , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[12]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[13]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[14]  Fabian Beck,et al.  On the impact of software evolution on software clustering , 2012, Empirical Software Engineering.

[15]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[16]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[17]  Fabian Beck,et al.  Evaluating the Impact of Software Evolution on Software Clustering , 2010, 2010 17th Working Conference on Reverse Engineering.

[18]  Richard C. Holt,et al.  Predicting change propagation in software systems , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[19]  Susan T. Dumais,et al.  Improving the retrieval of information from external sources , 1991 .

[20]  Onaiza Maqbool,et al.  Hierarchical Clustering for Software Architecture Recovery , 2007, IEEE Transactions on Software Engineering.

[21]  Dirk Beyer,et al.  Clustering software artifacts based on frequent common changes , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[22]  Thomas Zimmermann,et al.  Preprocessing CVS Data for Fine-Grained Analysis , 2004, MSR.

[23]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[24]  Sven Apel,et al.  Granularity in software product lines , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[25]  Martin P. Robillard,et al.  The Emergent Structure of Development Tasks , 2005, ECOOP.

[26]  Marcelo de Almeida Maia,et al.  Assessing modularity using co-change clusters , 2014, MODULARITY.

[27]  Andreas Zeller,et al.  How history justifies system architecture (or not) , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[28]  A. Steven Klusener,et al.  Assessing software archives with evolutionary clusters , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[29]  Denys Poshyvanyk,et al.  The conceptual cohesion of classes , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[30]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[31]  Mik Kersten,et al.  Using task context to improve programmer productivity , 2006, SIGSOFT '06/FSE-14.