An Analysis of Machine Learning Algorithms for Condensing Reverse Engineered Class Diagrams

There is a range of techniques available to reverse engineer software designs from source code. However, these approaches generate highly detailed representations. The condensing of reverse engineered representations into more high-level design information would enhance the understandability of reverse engineered diagrams. This paper describes an automated approach for condensing reverse engineered diagrams into diagrams that look as if they are constructed as forward designed UML models. To this end, we propose a machine learning approach. The training set of this approach consists of a set of forward designed UML class diagrams and reverse engineered class diagrams (for the same system). Based on this training set, the method 'learns' to select the key classes for inclusion in the class diagrams. In this paper, we study a set of nine classification algorithms from the machine learning community and evaluate which algorithms perform best for predicting the key classes in a class diagram.

[1]  Michel R. V. Chaudron,et al.  A Survey of the Practice of Design -- Code Correspondence amongst Professional Software Engineers , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[2]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[3]  Lionel C. Briand,et al.  Investigating quality factors in object-oriented designs: an industrial case study , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[4]  Michel R. V. Chaudron,et al.  UML Class Diagram Simplification - A Survey for Improving Reverse Engineered Class Diagram Comprehension , 2013, MODELSWARD.

[5]  Jonathan I. Maletic,et al.  Measuring Class Importance in the Context of Design Evolution , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[6]  Michel R. V. Chaudron,et al.  UML class diagram simplification: what is in the developer's mind? , 2012, EESSMod '12.

[7]  Hafeez Osman,et al.  Correctness and Completeness of CASE Tools in Reverse EngineeringSource Code into UML Model , 2014 .

[8]  Robert A Sottilare,et al.  Conducting an Analysis of a Qualitative Dataset Using the Waikato Environment for Knowledge Analysis (WEKA) , 2015 .

[9]  Elmar Jürgens,et al.  Using Network Analysis for Recommendation of Central Software Classes , 2012, 2012 19th Working Conference on Reverse Engineering.

[10]  Harald C. Gall,et al.  A comparison of four reverse engineering tools , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[11]  Marco Torchiano,et al.  Empirical studies in reverse engineering: state of the art and future trends , 2007, Empirical Software Engineering.

[12]  Grant J. Wang,et al.  Algorithms for data mining , 2006 .

[13]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[14]  Andy Zaidman,et al.  Journal of Software Maintenance and Evolution: Research and Practice Automatic Identification of Key Classes in a Software System Using Webmining Techniques , 2022 .

[15]  P. van der Putten,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004 .

[16]  James H. Cross,et al.  Reverse engineering and design recovery: a taxonomy , 1990, IEEE Software.

[17]  Premkumar T. Devanbu,et al.  An Investigation into Coupling Measures for C++ , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[18]  John Mark,et al.  Introduction to radial basis function networks , 1996 .

[19]  Janice Singer,et al.  NavTracks: supporting navigation in software , 2005, 13th International Workshop on Program Comprehension (IWPC'05).

[20]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[21]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[22]  Maarten van Someren,et al.  A Bias-Variance Analysis of a Real World Learning Problem: The CoIL Challenge 2000 , 2004, Machine Learning.

[23]  Mario Piattini,et al.  A Survey of Metrics for UML Class Diagrams , 2005, J. Object Technol..

[24]  Hausi A. Müller,et al.  Cognitive design elements to support the construction of a mental model during software visualization , 1997, Proceedings Fifth International Workshop on Program Comprehension. IWPC'97.

[25]  Isabel M. Ramos,et al.  Are forward designed or reverse-engineered UML diagrams more helpful for code maintenance?: a controlled experiment , 2013, EASE '13.

[26]  Yuming Zhou,et al.  Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[27]  Jorge Ressia,et al.  Ranking Software Artifacts , 2010 .

[28]  Janice Singer,et al.  NavTracks: supporting navigation in software maintenance , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[29]  Alexander Egyed Automated abstraction of class diagrams , 2002, TSEM.

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[31]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[32]  Curtis R. Cook,et al.  Use of Factor Analysis to Develop OOP Software Complexity Metrics , 1994 .

[33]  Lionel C. Briand,et al.  A Precise Method-Method Interaction-Based Cohesion Metric for Object-Oriented Classes , 2012, TSEM.