An exploratory study to investigate the impact of conceptualization in god class detection

Context: The concept of code smells is widespread in Software Engineering. However, in spite of the many discussions and claims about them, there are few empirical studies to support or contest these ideas. In particular, the study of the human perception of what is a code smell and how to deal with it has been mostly neglected. Objective: To build empirical support to understand the effect of god classes, one of the most known code smells. In particular, this paper focuses on how conceptualization affects identification of god classes, i.e., how different people perceive the god class concept. Method: A controlled experiment that extends and builds upon another empirical study about how humans detect god classes [19]. Our study: i) deepens and details some of the research questions of the previous study, ii) introduces a new research question and, iii) when possible, compares the results of both studies. Result: Our findings show that participants have different personal criteria and preferences in choosing drivers to identify god classes. The agreement between participants is not high, which is in accordance with previous studies. Conclusion: This study contributes to expand the empirical data about the human perception of code smells. It also presents a new way to evaluate effort and distraction in experiments through the use of automatic logging of participant actions.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  M. Mäntylä,et al.  Subjective evaluation of software evolvability using code smells: An empirical study , 2006, Empirical Software Engineering.

[3]  Raed Shatnawi,et al.  An empirical study of the bad smells and class error probability in the post-release object-oriented system evolution , 2007, J. Syst. Softw..

[4]  K. Gwet Kappa Statistic is not Satisfactory for Assessing the Extent of Agreement Between Raters , 2002 .

[5]  A. J. Conger Integration and generalization of kappas for multiple raters. , 1980 .

[6]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[7]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[8]  Forrest Shull,et al.  Building empirical support for automated code smell detection , 2010, ESEM '10.

[9]  David M. W. Powers,et al.  The Problem with Kappa , 2012, EACL.

[10]  Daniela Cruzes,et al.  Are all code smells harmful? A study of God Classes and Brain Classes in the evolution of three open source systems , 2010, 2010 IEEE International Conference on Software Maintenance.

[11]  Mika Mäntylä,et al.  Drivers for software refactoring decisions , 2006, ISESE '06.

[12]  R. H. Finn A Note on Estimating the Reliability of Categorical Data , 1970 .

[13]  Mika Mäntylä,et al.  An experiment on subjective evolvability evaluation of object-oriented software: explaining factors and interrater agreement , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[14]  Mika Mäntylä,et al.  Bad smells - humans as code critics , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[15]  Daniela Cruzes,et al.  The evolution and impact of code smells: A case study of two open source systems , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[16]  Grover J. Whitehurst,et al.  Interrater agreement for journal manuscript reviews. , 1984 .

[17]  Arthur J. Riel,et al.  Object-Oriented Design Heuristics , 1996 .

[18]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .