Creating and Analyzing Source Code Repository Models - A Model-based Approach to Mining Software Repositories

With mining software repositories (MSR), we analyze the rich data created during the whole evolution of one or more software projects. One major obstacle in MSR is the heterogeneity and complexity of source code as a data source. With model-based technology in general and reverse engineering in particular, we can use abstraction to overcome this obstacle. But, this raises a new question: can we apply existing reverse engineering frameworks that were designed to create models from a single revision of a software system to analyze all revisions of such a system at once? This paper presents a framework that uses a combination of EMF, the reverse engineering framework Modisco, a NoSQL-based model persistence framework, and OCL-like expressions to create and analyze fully resolved AST-level model representations of whole source code repositories. We evaluated the feasibility of this approach with a series of experiments on the Eclipse code-base.

[1]  Jordi Cabot,et al.  MoDisco: a generic and extensible framework for model driven reverse engineering , 2010, ASE.

[2]  Benjamin Livshits,et al.  DynaMine: finding common error patterns by mining software revision histories , 2005, ESEC/FSE-13.

[3]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[4]  Xavier Blanc,et al.  The Harmony Platform , 2013, ArXiv.

[5]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[6]  James H. Cross,et al.  Reverse engineering and design recovery: a taxonomy , 1990, IEEE Software.

[7]  Hausi A. Müller,et al.  Predicting fault-proneness using OO metrics. An industrial case study , 2002, Proceedings of the Sixth European Conference on Software Maintenance and Reengineering.

[8]  Dimitrios S. Kolovos,et al.  Hawk: towards a scalable model indexing architecture , 2013, BigMDE '13.

[9]  I MaleticJonathan,et al.  A survey and taxonomy of approaches for mining software repositories in the context of software evolution , 2007 .

[10]  Anatolij Zubow,et al.  EMF modeling in traffic surveillance experiments , 2012, MOTPW '12.

[11]  Anatolij Zubow,et al.  Automated and transparent model fragmentation for persisting large models , 2012, MODELS'12.

[12]  Joel Ossher,et al.  Sourcerer: An internet-scale software repository , 2009, 2009 ICSE Workshop on Search-Driven Development-Users, Infrastructure, Tools and Evaluation.

[13]  Ramanath Subramanyam,et al.  Empirical Analysis of CK Metrics for Object-Oriented Design Complexity: Implications for Software Defects , 2003, IEEE Trans. Software Eng..

[14]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[15]  Hridesh Rajan,et al.  Boa: Ultra-Large-Scale Software Repository and Source-Code Mining , 2015, ACM Trans. Softw. Eng. Methodol..

[16]  Georgios Gousios,et al.  A platform for software engineering research , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[17]  Ioannis Korkontzelos,et al.  Software Analytics for MDE Communities , 2014, OSS4MDE@MoDELS.

[18]  Juri Di Rocco,et al.  Mining metrics for understanding metamodel characteristics , 2014, MiSE 2014.

[19]  Manuel Wimmer,et al.  A survey on model versioning approaches , 2009, Int. J. Web Inf. Syst..

[20]  Michael Weiss,et al.  Design Evolution of an Open Source Project Using an Improved Modularity Metric , 2009, OSS.

[21]  Tibor Gyimóthy,et al.  Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[22]  Steve Wilson,et al.  Java Platform Performance - Strategies and Tactics , 2000 .

[23]  Richard F. Paige,et al.  Different models for model matching: An analysis of approaches to support model differencing , 2009, 2009 ICSE Workshop on Comparison and Versioning of Software Models.

[24]  Markus Scheidgen,et al.  Model-Based Mining of Source Code Repositories , 2014, SAM.

[25]  Juri Di Rocco,et al.  Models of OSS project meta-information: a dataset of three forges , 2014, MSR 2014.

[26]  Chadd C. Williams,et al.  Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.