Design recovery and data mining: a methodology that identifies data-cohesive subsystems based on mining association rules

Software maintenance is both a technical and an economic concern for organizations. Large software systems are difficidt to maintain due to their intrinsic complexity, and their maintenance consumes between 50% and 90% of the cost of their complete life-cycle. An essential step in maintenance is reverse engineering, which focuses on understanding the system. This system understanding is critical to avoid the generation of undesired side effects during maintenance. The objective of this research is to investigate the potential of applying data mining to reverse engineering. This research was motivated by the following: (1) data mining can process large volumes of information, (2) data mining can elicit meaningful information without previous knowledge of the domain, (3) data mining can extract novel non-trivial relationships firom a data set, and (4) data mining is automatable. These data mining features are used to help address the problem of understanding large legacy systems. This research produced a general method to apply data mining to reverse engineering, and a methodology for design recovery, called Identification of Subsystems based on Associations (ISA). ISA uses mined association rules from a database view of the subject system to guide a clustering process that produces a data-cohesive hierarchical subsystem decomposition of the system. ISA promotes object-oriented principles because each identified subsystem consists of a set of data repositories and the code (i.e., programs) that manipulates them. ISA is an automatic multi-step process, which uses the source code of the subject system and multiple parameters as its input. ISA includes two representation models (i.e., text-based and graphic-based representation models) to present the resulting subsystem decomposition. The automated enviromnent RE-ISA implements the ISA methodology. RE-ISA was used to produce the subsystem decomposition of real-word software systems. Results show that ISA can automatically produce data-cohesive subsystem decompositions without previous

[1]  Song C. Choi,et al.  Extracting and restructuring the design of large systems , 1990, IEEE Software.

[2]  Alexander S. Yeh,et al.  Reverse Engineering to the Architectural Level , 1995, 1995 17th International Conference on Software Engineering.

[3]  Gokul V. Subramaniam,et al.  Deriving an object model from legacy Fortran code , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[4]  Gordon Kotik,et al.  Reengineering procedural into object-oriented systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[5]  Peter T. Breuer,et al.  Reverse-engineering Cobol via formal methods , 1993, J. Softw. Maintenance Res. Pract..

[6]  Gregory Piatetsky-Shapiro,et al.  Knowledge Discovery in Databases: An Overview , 1992, AI Mag..

[7]  Julio Cesar Sampaio do Prado Leite,et al.  Recovering Business Rules from Structured Analysis Specifications , 1995, WCRE.

[8]  Thomas A. Corbi,et al.  Program Understanding: Challenge for the 1990s , 1989, IBM Syst. J..

[9]  Malcolm Munro,et al.  PUI: a tool to support program understanding , 1997, Proceedings Fifth International Workshop on Program Comprehension. IWPC'97.

[10]  Alexander E. Quilici,et al.  Constraint-Based Design Recovery for Software Reengineering , 1998, The Springer International Series in Software Engineering.

[11]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[12]  Judith E. Grass Object-Oriented Design Archaeology with CIA++ , 1992, Comput. Syst..

[13]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[14]  Aniello Cimitile,et al.  Qualifying reusable functions using symbolic execution , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[15]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[16]  William C. Chu,et al.  A measure for composite module cohesion , 1992, International Conference on Software Engineering.

[17]  Gregory Butler,et al.  Retrieving information from data flow diagrams , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[18]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[19]  F. S. Germano,et al.  An overall process based on fusion to reverse engineer legacy code , 1996, Proceedings of WCRE '96: 4rd Working Conference on Reverse Engineering.

[20]  Norman Wilde,et al.  An object finder for program structure understanding in software maintenance , 1994, J. Softw. Maintenance Res. Pract..

[21]  Melissa P. Chase,et al.  Analysis and presentation of recovered software architectures , 1996, Proceedings of WCRE '96: 4rd Working Conference on Reverse Engineering.

[22]  Linda M. Wills,et al.  Recognizing a program's design: a graph-parsing approach , 1990, IEEE Software.

[23]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[24]  Nicolas Anquetil,et al.  Extracting concepts from file names; a new file clustering criterion , 1998, Proceedings of the 20th International Conference on Software Engineering.

[25]  Gerardo Canfora,et al.  An improved algorithm for identifying objects in code , 1996 .

[26]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[27]  Keith H. Bennett,et al.  Legacy Systems: Coping with Success , 1995, IEEE Softw..

[28]  Chap-Liong Ong Class and object extraction from imperative code , 1994 .

[29]  John V. Harrison,et al.  Evaluation of the ITOC information system design recovery tool , 1997, Proceedings Fifth International Workshop on Program Comprehension. IWPC'97.

[30]  Theodore Johnson,et al.  A new approach to finding objects in programs , 1994, J. Softw. Maintenance Res. Pract..

[31]  Kevin Lano,et al.  Formal Speciications in Software Maintenance: from Code to Z ++ and Back Again , 1993 .

[32]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[33]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[34]  Ben Shneiderman,et al.  Tree-maps: a space-filling approach to the visualization of hierarchical information structures , 1991, Proceeding Visualization '91.

[35]  Arun Lakhotia,et al.  A Unified Framework For Expressing Software Subsystem Classification Techniques , 1997, J. Syst. Softw..

[36]  R. W. Schwanke,et al.  An Intelligent Tool For Reengineering Software Modularity , 1991, ICSE 1991.

[37]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[38]  Véronique Narat Using a relational database for software maintenance: A case study , 1993, 1993 Conference on Software Maintenance.

[39]  Gregor Snelting,et al.  Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[40]  Thomas W. Reps,et al.  Identifying Modules via Concept Analysis , 1999, IEEE Trans. Software Eng..

[41]  Hausi A. Müller,et al.  Composing subsystem structures using (k,2)-partite graphs , 1990, Proceedings. Conference on Software Maintenance 1990.

[42]  Victor R. Basili,et al.  Analyzing Error-Prone System Structure , 1991, IEEE Trans. Software Eng..

[43]  David Harel,et al.  On visual formalisms , 1988, CACM.

[44]  Victor R. Basili,et al.  System Structure Analysis: Clustering with Data Bindings , 1985, IEEE Transactions on Software Engineering.

[45]  Ivar Jacobson,et al.  Re-engineering of Old Systems to an Object-Oriented Database , 1991, Conference on Object-Oriented Programming Systems, Languages, and Applications.

[46]  Ted J. Biggerstaff,et al.  Design recovery for maintenance and reuse , 1989, Computer.

[47]  Harald C. Gall,et al.  Finding objects in procedural programs: an alternative approach , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[48]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[49]  James H. Cross,et al.  Reverse engineering and design recovery: a taxonomy , 1990, IEEE Software.

[50]  D. L. Carver,et al.  A greedy approach to object identification in imperative code , 1994, Proceedings 1994 IEEE 3rd Workshop on Program Comprehension- WPC '94.

[51]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[52]  Helen M. Edwards,et al.  Recast: reverse engineering from COBOL to SSADM specification , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[53]  Gail E. Kaiser,et al.  Change management for very large software systems , 1988, Seventh Annual International Phoenix Conference on Computers an Communications. 1988 Conference Proceedings.

[54]  Gail E. Kaiser,et al.  An Information Retrieval Approach For Automatically Constructing Software Libraries , 1991, IEEE Trans. Software Eng..

[55]  Doris L. Carver,et al.  Identification of data cohesive subsystems using data mining techniques , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).