Reverse engineering software ecosystems

Reverse engineering is an active area of research concerned with discovering techniques and providing tools that support the understanding of software systems. All the techniques that were proposed until now study individual systems in isolation. However, software systems are seldom developed in isolation. Instead, many times, they are developed together with other projects in the wider context of an organization or a community. We call the collection of projects that are developed in such a context a software ecosystem. Often, a software ecosystem and the knowledge associated with it is the most valuable asset of its owner. Sometimes the ecosystem can be the very reason for the existence of the organization. In this thesis we show that software ecosystems are an interesting and challenging subject of study, and that reverse engineering techniques can be used beyond the level of individual systems in the process of understanding software ecosystems. Our main contributions are threefold: we introduce a methodology for reverse engineering software ecosystems, we provide tools that support the methodology, and we validate the methodology on multiple case studies. Our methodology is based on analyzing the source code and the information in the versioning system repositories of the projects in an ecosystem and generating visual representations of the results. These visual representations present the ecosystem from several complementary perspectives. Given the large amount of information in an ecosystem, we provide exploration mechanisms that allow one to navigate the wealth of information available about the ecosystem. We distinguish between two dimensions of ecosystem exploration: horizontal exploration allows one to navigate between different views of a given ecosystem, while vertical exploration allows one to dive into the details of individual projects in the ecosystem. Supporting horizontal exploration is a matter of linking the various ecosystem perspectives in the tool. Supporting vertical exploration implies connecting the ecosystem level model to the detailed models of the component projects and performing architecture recovery on those models. Since architecture recovery cannot be fully automated, in our work we introduce two techniques that ease the generation of intra-project architectural views. The first technique regards automating the exploration based on the classification modules in a set of structural patterns. The second technique regards automating the filtering of dependencies in the architectural views based on the classification of the inter-module dependencies based on their evolution. To validate our contributions we applied our tools and techniques on a set of ecosystem case studies that belong to various organizations: two academic institutions, one industrial software house, and one open-source community. We validated the techniques that work at the architectural level on several well-known open source software systems.

[1]  Janice Singer,et al.  NavTracks: supporting navigation in software maintenance , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[2]  Mircea Lungu,et al.  Developer-centric Analysis of SVN Ecosystems , 2009 .

[3]  Nick Mitchell,et al.  Visualizing the Execution of Java Programs , 2001, Software Visualization.

[4]  Cristina Marinescu,et al.  iPlasma: An Integrated Platform for Quality Assessment of Object-Oriented Design , 2005, ICSM.

[5]  Michele Lanza,et al.  Reverse Engineering Super-Repositories , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[6]  Tibor Gyimóthy,et al.  Columbus - reverse engineering tool and schema for C++ , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[7]  Rene L. Krikhaar,et al.  Software architecture reconstruction , 1999 .

[8]  Oscar Nierstrasz,et al.  Enriching Reverse Engineering with Annotations , 2008, MoDELS.

[9]  Anneliese Amschler Andrews,et al.  Program Comprehension During Software Maintenance and Evolution , 1995, Computer.

[10]  Dawid Weiss A Large Crawl and Quantitative Analysis of Open Source Projects Hosted on SourceForge , 2005 .

[11]  Nicholas Zvegintzov,et al.  IEEE standard for software maintenance approved, criticized , 1993 .

[12]  Ian Sommerville,et al.  Software engineering (5th ed.) , 1995 .

[13]  Michele Lanza,et al.  Promises and perils of porting software visualization tools to the web , 2009, 2009 11th IEEE International Symposium on Web Systems Evolution.

[14]  Philippe Kruchten,et al.  The 4+1 View Model of Architecture , 1995, IEEE Softw..

[15]  Ralph Johnson,et al.  design patterns elements of reusable object oriented software , 2019 .

[16]  M.M. Lehman,et al.  Programs, life cycles, and laws of software evolution , 1980, Proceedings of the IEEE.

[17]  Jens Knodel,et al.  Constructive architecture compliance checking — an experiment on support by live feedback , 2008, 2008 IEEE International Conference on Software Maintenance.

[18]  Michele Lanza,et al.  The evolution matrix: recovering software evolution using software visualization techniques , 2001, IWPSE '01.

[19]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[20]  Audris Mockus,et al.  Does Code Decay? Assessing the Evidence from Change Management Data , 2001, IEEE Trans. Software Eng..

[21]  Rick Kazman,et al.  The Perils and Joys of Reconstructing Architectures , 1999 .

[22]  Kate Ehrlich,et al.  Empirical Studies of Programming Knowledge , 1984, IEEE Transactions on Software Engineering.

[23]  Stéphane Ducasse,et al.  Polymetric Views - A Lightweight Visual Approach to Reverse Engineering , 2003, IEEE Trans. Software Eng..

[24]  David Lorge Parnas,et al.  Software aging , 1994, Proceedings of 16th International Conference on Software Engineering.

[25]  Ben Shneiderman,et al.  Software psychology: Human factors in computer and information systems (Winthrop computer systems series) , 1980 .

[26]  Michele Lanza,et al.  Visualizing Gnome with the Small Project Observatory , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[27]  Tudor Gîrba,et al.  Modeling History to Understand Software Evolution , 2005 .

[28]  Hausi A. Müller,et al.  The Software Bookshelf , 1997, IBM Syst. J..

[29]  Dennis B. Smith,et al.  Towards a framework for program understanding , 1996, WPC '96. 4th Workshop on Program Comprehension.

[30]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[31]  Lou J. Somers,et al.  Using version information in architectural clustering - a case study , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[32]  Richard C. Holt,et al.  Linux as a case study: its extracted software architecture , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[33]  Kevin Crowston,et al.  Collaboration using OSSmole: a repository of FLOSS data and analyses , 2005, MSR '05.

[34]  Oscar Nierstrasz,et al.  The story of moose: an agile reengineering environment , 2005, ESEC/FSE-13.

[35]  Mehdi Jazayeri,et al.  Software Architecture for Product Families: Principles and Practice , 2000 .

[36]  Michele Lanza,et al.  Softwarenaut: cutting edge visualization , 2006, SoftVis '06.

[37]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[38]  Kenny Wong,et al.  The reverse engineering notebook , 1999 .

[39]  L. Erlikh,et al.  Leveraging legacy system dollars for e-business , 2000 .

[40]  Michele Lanza,et al.  Visualizing Software Systems as Cities , 2007, 2007 4th IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[41]  Robert L. Nord,et al.  Applied Software Architecture , 1999, Addison Wesley object technology series.

[42]  Michele Lanza,et al.  Package patterns for visual architecture recovery , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[43]  Code Maintenance Best Practices 4 Essential Skills for Lean Times , .

[44]  David Notkin,et al.  Lightweight source model extraction , 1995, SIGSOFT '95.

[45]  Stéphane Ducasse,et al.  Characterizing the evolution of class hierarchies , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[46]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[47]  Harald C. Gall,et al.  Visualizing software release histories: the use of color and third dimension , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[48]  Rainer Koschke,et al.  Equipping the reflexion method with automated clustering , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[49]  Daniel M. Germán,et al.  Macro-level software evolution: a case study of a large software compilation , 2009, Empirical Software Engineering.

[50]  Kate Ehrlich,et al.  Empirical Studies of Programming Knowledge , 1984, IEEE Transactions on Software Engineering.

[51]  Loe Feijs,et al.  A relational approach to support software architecture analysis , 1998 .

[52]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[53]  Jock D. Mackinlay,et al.  Cone Trees: animated 3D visualizations of hierarchical information , 1991, CHI.

[54]  Ciaran O'Reilly,et al.  The war room command console: shared visualizations for inclusive team coordination , 2005, SoftVis '05.

[55]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[56]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[57]  John Domingue,et al.  Software visualization : programming as a multimedia experience , 1998 .

[58]  Sushil Krishna Bajracharya,et al.  Sourcerer: a search engine for open source code supporting structure-based search , 2006, OOPSLA '06.

[59]  Thomas A. Corbi,et al.  Program Understanding: Challenge for the 1990s , 1989, IBM Syst. J..

[60]  Stéphane Ducasse,et al.  How developers drive software evolution , 2005, Eighth International Workshop on Principles of Software Evolution (IWPSE'05).

[61]  Michele Lanza,et al.  The evolution radar: visualizing integrated logical coupling information , 2006, MSR '06.

[62]  Dewayne E. Perry,et al.  Metrics and laws of software evolution-the nineties view , 1997, Proceedings Fourth International Software Metrics Symposium.

[63]  Stéphane Ducasse,et al.  Multi-level Method Understanding Using Microprints , 2005, 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[64]  Alan M. Davis,et al.  201 Principles of Software Development , 1995 .

[65]  Gruia-Catalin Roman,et al.  Pavane: a system for declarative visualization of concurrent computations , 1992, J. Vis. Lang. Comput..

[66]  Christian S. Collberg,et al.  A system for graph-based visualization of the evolution of software , 2003, SoftVis '03.

[67]  David R. Karger,et al.  Relo: Helping Users Manage Context during Interactive Exploratory Visualization of Large Codebases , 2006, VL/HCC.

[68]  C. Doyle-Jones From the Editorial Team , 2010 .

[69]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis Techniques to Community-Driven Libre Software Projects , 2006, Int. J. Inf. Technol. Web Eng..

[70]  Romain Robbes,et al.  Versioning systems for evolution research , 2005, Eighth International Workshop on Principles of Software Evolution (IWPSE'05).

[71]  Alexander L. Wolf,et al.  Acm Sigsoft Software Engineering Notes Vol 17 No 4 Foundations for the Study of Software Architecture , 2022 .

[72]  Hausi A. Müller,et al.  Rigi: a system for programming-in-the-large , 1988, Proceedings. [1989] 11th International Conference on Software Engineering.

[73]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[74]  Michele Lanza,et al.  CodeCrawler-lessons learned in building a software visualization tool , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[75]  Michael W. Godfrey,et al.  YARN: Animating Software Evolution , 2007, 2007 4th IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[76]  Blaine A. Price,et al.  A Principled Taxonomy of Software Visualization , 1993, J. Vis. Lang. Comput..

[77]  Giuliano Antoniol,et al.  An automatic approach to identify class evolution discontinuities , 2004 .

[78]  Xiaomin Wu,et al.  Plugging-in visualization: experiences integrating a visualization tool with Eclipse , 2003, SoftVis '03.

[79]  Stéphane Ducasse,et al.  Yesterday's Weather: guiding early reverse engineering efforts by summarizing the evolution of changes , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[80]  Emden R. Gansner,et al.  An open graph visualization system and its applications to software engineering , 2000 .

[81]  Rick Kazman,et al.  View extraction and view fusion in architectural understanding , 1998, Proceedings. Fifth International Conference on Software Reuse (Cat. No.98TB100203).

[82]  Rick Kazman,et al.  Playing Detective: Reconstructing Software Architecture from Available Evidence , 1999, Automated Software Engineering.

[83]  Radu Marinescu,et al.  Measurement and Quality in Object-Oriented Design , 2005, ICSM.

[84]  Ted J. Biggerstaff,et al.  Program understanding and the concept assignment problem , 1994, CACM.

[85]  Michael W. Godfrey,et al.  Detecting merging and splitting using origin analysis , 2003, 10th Working Conference on Reverse Engineering, 2003. WCRE 2003. Proceedings..

[86]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[87]  Mik Kersten,et al.  Mylar: a degree-of-interest model for IDEs , 2005, AOSD '05.

[88]  Hausi A. Müller,et al.  Reverse engineering: a roadmap , 2000, ICSE '00.

[89]  Jesús M. González-Barahona,et al.  Contributor Turnover in Libre Software Projects , 2006, OSS.

[90]  Daniel M. Germán,et al.  Visualizing Software Architecture Evolution Using Change-Sets , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[91]  J. E. Sammet,et al.  Software psychology: human factors in computer and information systems , 1983, SGCH.

[92]  Ruven E. Brooks,et al.  Towards a Theory of the Comprehension of Computer Programs , 1983, Int. J. Man Mach. Stud..

[93]  Romain Robbes,et al.  Of Change and Software , 2009 .

[94]  James H. Cross,et al.  Reverse engineering and design recovery: a taxonomy , 1990, IEEE Software.

[95]  Stanley Letovsky,et al.  Cognitive processes in program comprehension , 1986, J. Syst. Softw..

[96]  Houari A. Sahraoui,et al.  Predicting class libraries interface evolution: an investigation into machine learning approaches , 2000, Proceedings Seventh Asia-Pacific Software Engeering Conference. APSEC 2000.

[97]  Stéphane Ducasse,et al.  Why Unified is not Universal? UML Shortcomings for Coping with Round-trip Engineering , 1999, UML.

[98]  Meir M. Lehman,et al.  Program evolution: processes of software change , 1985 .

[99]  Kim Mens,et al.  Co-evolving code and design with intensional views: A case study , 2005, Comput. Lang. Syst. Struct..

[100]  Michele Lanza,et al.  Object-Oriented Reverse Engineering —- Coarse-grained, Fine-grained, and Evolutionary Software Visualization , 2003 .

[101]  Stéphane Ducasse,et al.  The class blueprint: visually supporting the understanding of glasses , 2005, IEEE Transactions on Software Engineering.

[102]  Michele Lanza,et al.  Interactive Exploration of Semantic Clusters , 2005, 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[103]  Harald C. Gall,et al.  Visualizing multiple evolution metrics , 2005, SoftVis '05.

[104]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[105]  Richard C. Holt,et al.  GASE: visualizing software evolution-in-the-large , 1996, Proceedings of WCRE '96: 4rd Working Conference on Reverse Engineering.

[106]  Romain Robbes,et al.  The Small Project Observatory: Visualizing software ecosystems , 2010, Sci. Comput. Program..

[107]  Andy Schürr,et al.  GXL: toward a standard exchange format , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[108]  Stephen G. Eick,et al.  Seesoft-A Tool For Visualizing Line Oriented Software Statistics , 1992, IEEE Trans. Software Eng..

[109]  Kris De Volder,et al.  Navigating and querying code without getting lost , 2003, AOSD '03.

[110]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[111]  Doug Kimelman,et al.  Visualizing the behavior of object-oriented systems , 1993, OOPSLA '93.

[112]  Robert Sedgewick,et al.  A system for algorithm animation , 1984, SIGGRAPH.

[113]  Stéphane Ducasse,et al.  Seaside: A Flexible Environment for Building Dynamic Web Applications , 2007, IEEE Software.

[114]  Michele Lanza,et al.  Exploring Inter-Module Relationships in Evolving Software Systems , 2007, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[115]  René L. Krikhaar,et al.  Architecture analysis tools to support evolution of large industrial systems , 2002, International Conference on Software Maintenance, 2002. Proceedings..

[116]  Stéphane Ducasse,et al.  Moose: A Collaborative and Extensible Reengineering Environment , 2005, Tools for Software Maintenance and Reengineering.

[117]  Alexandru Telea,et al.  Visual Exploration of Combined Architectural and Metric Information , 2005, 3rd IEEE International Workshop on Visualizing Software for Understanding and Analysis.

[118]  Paul Clements,et al.  Software architecture in practice , 1999, SEI series in software engineering.

[119]  Michele Lanza,et al.  Softwarenaut: exploring hierarchical system decompositions , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[120]  Andrian Marcus,et al.  3D representations for software visualization , 2003, SoftVis '03.

[121]  David Notkin,et al.  Software reflexion models: bridging the gap between source and high-level models , 1995, SIGSOFT FSE.

[122]  Danny B. Lange,et al.  Object-Oriented Program Tracing and Visualization , 1997, Computer.

[123]  Martin Pinzger,et al.  ArchView - Analyzing Evolutionary Aspects of Complex Software Systems , 2005 .

[124]  Hausi A. Müller,et al.  Manipulating and documenting software structures using SHriMP views , 1995, Proceedings of International Conference on Software Maintenance.

[125]  Steven P. Reiss,et al.  Semantics-based code search , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[126]  Toon Verwaest,et al.  FAME, A Polyglot Library for Metamodeling at Runtime , 2008 .

[127]  Michele Lanza,et al.  Visualizing Co-Change Information with the Evolution Radar , 2009, IEEE Transactions on Software Engineering.

[128]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[129]  Hausi A. Müller,et al.  Cognitive design elements to support the construction of a mental model during software exploration , 1999, J. Syst. Softw..

[130]  Alan J. Perlis,et al.  Special Feature: Epigrams on programming , 1982, SIGP.