Modeling class cohesion as mixtures of latent topics

The paper proposes a new measure for the cohesion of classes in Object-Oriented software systems. It is based on the analysis of latent topics embedded in comments and identifiers in source code. The measure, named as Maximal Weighted Entropy, utilizes the Latent Dirichlet Allocation technique and information entropy measures to quantitatively evaluate the cohesion of classes in software. This paper presents the principles and the technology that stand behind the proposed measure. Two case studies on a large open source software system are presented. They compare the new measure with an extensive set of existing metrics and use them to construct models that predict software faults. The case studies indicate that the novel measure captures different aspects of class cohesion compared to the existing cohesion measures and improves fault prediction for most metrics, which are combined with Maximal Weighted Entropy.

[1]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[2]  Santonu Sarkar,et al.  Mining business topics in source code using latent dirichlet allocation , 2008, ISEC '08.

[3]  Carl G. Davis,et al.  A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[4]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[5]  Letha H. Etzkorn,et al.  Source Code Retrieval for Bug Localization Using Latent Dirichlet Allocation , 2008, 2008 15th Working Conference on Reverse Engineering.

[6]  S. Read Applications of Case Study Research , 2003 .

[7]  Tibor Gyimóthy,et al.  Columbus - reverse engineering tool and schema for C++ , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[8]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[9]  Lionel C. Briand,et al.  Assessing the Applicability of Fault-Proneness Models Across Object-Oriented Software Projects , 2002, IEEE Trans. Software Eng..

[10]  Martin Hitz,et al.  Measuring coupling and cohesion in object-oriented systems , 1995 .

[11]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[12]  Warren Harrison,et al.  An Entropy-Based Measure of Software Complexity , 1992, IEEE Trans. Software Eng..

[13]  Lionel C. Briand,et al.  Dynamic coupling measurement for object-oriented software , 2004, IEEE Transactions on Software Engineering.

[14]  Lionel C. Briand,et al.  A Unified Framework for Cohesion Measurement in Object-Oriented Systems , 2004, Empirical Software Engineering.

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  Sushil Krishna Bajracharya,et al.  Mining concepts from code with probabilistic topic models , 2007, ASE.

[17]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[18]  Baowen Xu,et al.  Measuring Aspect Cohesion , 2004, FASE.

[19]  Denys Poshyvanyk,et al.  The conceptual cohesion of classes , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[20]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[21]  Lionel C. Briand,et al.  A Unified Framework for Cohesion Measurement , 1997, IEEE METRICS.

[22]  Yuming Zhou,et al.  ICBMC: an improved cohesion measure for classes , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[23]  Cemal Yilmaz,et al.  Software Metrics , 2008, Wiley Encyclopedia of Computer Science and Engineering.

[24]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[25]  Peter W. Foltz,et al.  The Measurement of Textual Coherence with Latent Semantic Analysis. , 1998 .

[26]  Taghi M. Khoshgoftaar,et al.  Measuring coupling and cohesion of software modules: an information-theory approach , 2001, Proceedings Seventh International Software Metrics Symposium.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[29]  James M. Bieman,et al.  Cohesion and reuse in an object-oriented system , 1995, SSR '95.

[30]  Sushil Krishna Bajracharya,et al.  A theory of aspects as latent topics , 2008, OOPSLA.

[31]  Andrian Marcus,et al.  Supporting program comprehension using semantic and structural information , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[32]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Doo-Hwan Bae,et al.  Improving cohesion metrics for classes by considering dependent instance variables , 2004, IEEE Transactions on Software Engineering.

[34]  Hermann Kaindl,et al.  Coupling and cohesion metrics for knowledge-based systems using frames and rules , 2004, TSEM.

[35]  Letha H. Etzkorn,et al.  A semantic entropy metric , 2002, J. Softw. Maintenance Res. Pract..

[36]  Tim Menzies,et al.  Data Mining Static Code Attributes to Learn Defect Predictors , 2007, IEEE Transactions on Software Engineering.

[37]  BinkleyDavid,et al.  An empirical study of slice-based cohesion and coupling metrics , 2007 .

[38]  Tong-Seng Quah,et al.  Application of neural networks for software quality prediction using object-oriented metrics , 2005, J. Syst. Softw..

[39]  Andrea De Lucia,et al.  Using structural and semantic metrics to improve class cohesion , 2008, 2008 IEEE International Conference on Software Maintenance.

[40]  Rudolf Ferenc,et al.  Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[41]  Letha H. Etzkorn,et al.  Automatically Identifying Reusable OO Legacy Code , 1997, Computer.