Exploring the Limits of Domain Model Recovery

We are interested in re-engineering families of legacy applications towards using Domain-Specific Languages (DSLs). Is it worth to invest in harvesting domain knowledge from the source code of legacy applications? Reverse engineering domain knowledge from source code is sometimes considered very hard or even impossible. Is it also difficult for "modern legacy systems"? In this paper we select two open-source applications and answer the following research questions: which parts of the domain are implemented by the application, and how much can we manually recover from the source code? To explore these questions, we compare manually recovered domain models to a reference model extracted from domain literature, and measured precision and recall. The recovered models are accurate: they cover a significant part of the reference model and they do not contain much junk. We conclude that domain knowledge is recoverable from "modern legacy" code and therefore domain model recovery can be a valuable component of a domain re-engineering process.

[1]  Spencer Rugaber,et al.  Domain analysis and reverse engineering , 1994, Proceedings 1994 International Conference on Software Maintenance.

[2]  Stéphane Ducasse,et al.  Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[3]  Tijs van der Storm,et al.  RASCAL: A Domain Specific Language for Source Code Analysis and Manipulation , 2009, 2009 Ninth IEEE International Working Conference on Source Code Analysis and Manipulation.

[4]  Jan Jürjens,et al.  Extracting Domain Ontologies from Domain Specific APIs , 2008, 2008 12th European Conference on Software Maintenance and Reengineering.

[5]  Gerald C. Gannod,et al.  Recovering Concepts from Source Code with Automated Concept Identification , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[6]  Hyoil Han,et al.  A survey on ontology mapping , 2006, SGMD.

[7]  Sushil Krishna Bajracharya,et al.  Mining concepts from code with probabilistic topic models , 2007, ASE.

[8]  M Mernik,et al.  When and how to develop domain-specific languages , 2005, CSUR.

[9]  Paolo Tonella,et al.  Natural Language Parsing of Program Element Names for Concept Extraction , 2010, 2010 IEEE 18th International Conference on Program Comprehension.

[10]  Udo Kelter,et al.  Differences between versions of UML diagrams , 2003, ESEC/FSE-11.

[11]  Kun Wang,et al.  Improving the Accuracy of UML Class Model Recovery , 2007, 31st Annual International Computer Software and Applications Conference (COMPSAC 2007).

[12]  Colin Potts,et al.  Ontological excavation: unearthing the core concepts of the application , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[13]  Andrew M. Sutton,et al.  Mappings for accurately reverse engineering UML class models from C++ , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[14]  W. Duncan A GUIDE TO THE PROJECT MANAGEMENT BODY OF KNOWLEDGE , 1996 .

[15]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[16]  Krzysztof Czarnecki,et al.  Reverse engineering feature models , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[17]  Rainer Koschke,et al.  Locating Features in Source Code , 2003, IEEE Trans. Software Eng..

[18]  Udo Kelter,et al.  A Generic Difference Algorithm for UML Models , 2005, Software Engineering.

[19]  Ted J. Biggerstaff,et al.  The concept assignment problem in program understanding , 1993, [1993] Proceedings Working Conference on Reverse Engineering.

[20]  Paolo Tonella,et al.  Towards the Extraction of Domain Concepts from the Identifiers , 2011, 2011 18th Working Conference on Reverse Engineering.

[21]  Ted J. Biggerstaff,et al.  Design recovery for maintenance and reuse , 1989, Computer.

[22]  Norman Wilde,et al.  The role of concepts in program comprehension , 2002, Proceedings 10th International Workshop on Program Comprehension.

[23]  Michaela Bacíková,et al.  Analyzing stereotypes of creating graphical user interfaces , 2012, Central European Journal of Computer Science.