Automated Classification of Class Role-Stereotypes via Machine Learning

Role stereotypes indicate generic roles that classes play in the design of software systems (e.g. controller, information holder, or interfacer). Knowledge about the role-stereotypes can help in various tasks in software development and maintenance, such as program understanding, program summarization, and quality assurance. This paper presents an automated machine learning-based approach for classifying the role-stereotype of classes in Java. We analyse the performance of this approach against a manually labelled ground truth for a sizable open source project (of 770+ Java classes) for the Android platform. Moreover, we compare our approach to an existing rule-based classification approach. The contributions of this paper include an analysis of which machine learning algorithms and which features provide the best classification performance. This analysis shows that the Random Forest algorithm yields the best classification performance. We find however, that the performance of the ML-classifier varies a lot for classifying different role-stereotypes. In particular its performs degrades for rare role-types. Our ML-classifier improves over the existing rule-based classification method in that the ML-approach classifies all classes, while rule-based approaches leave a significant number of classes unclassified.

[1]  Andrian Marcus,et al.  JStereoCode: automatically identifying method and class stereotypes in Java code , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[2]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[3]  Stéphane Ducasse,et al.  Object-Oriented Metrics in Practice , 2005 .

[4]  Ivar Jacobson,et al.  Object Design: Roles, Responsibilities, and Collaborations , 2002 .

[5]  Jonathan I. Maletic,et al.  The effect of layout on the comprehension of UML class diagrams: A controlled experiment , 2009, 2009 5th IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[6]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Marco Torchiano,et al.  How Developers' Experience and Ability Influence Web Application Comprehension Tasks Supported by UML Stereotypes: A Series of Four Experiments , 2010, IEEE Transactions on Software Engineering.

[9]  Sabri Boughorbel,et al.  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric , 2017, PloS one.

[10]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[11]  Rebecca Wirfs-Brock Characterizing classes , 2006, IEEE Software.

[12]  Jonathan I. Maletic,et al.  Evaluating UML Class Diagram Layout based on Architectural Importance , 2005, 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[13]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[14]  Jonathan I. Maletic,et al.  Assessing the Comprehension of UML Class Diagrams via Eye Tracking , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[15]  Jonathan I. Maletic,et al.  Improving Feature Location by Enhancing Source Code with Stereotypes , 2013, 2013 IEEE International Conference on Software Maintenance.

[16]  Jonathan I. Maletic,et al.  Reverse Engineering Method Stereotypes , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[17]  Danilo Caivano,et al.  Does the use of stereotypes improve the comprehension of UML sequence diagrams? , 2008, ESEM '08.

[18]  David Lo,et al.  Automated Detection of Likely Design Flaws in N-Tier Architectures , 2011, SEKE.

[19]  Jonathan I. Maletic,et al.  Automatic identification of class stereotypes , 2010, 2010 IEEE International Conference on Software Maintenance.

[20]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.