A SURVEY OF TECHNIQUES IN SOFTWARE REPOSITORY MINING

Digital records of software-engineering work are left by software developers during the development process. Source code is usually kept in a software repository, and software developers use issue-tracking repositories and online project-tracking software, as well as informal documentation to support their activities. The research discipline of mining software repositories (MSR) uses these extant, digital repositories to gain understanding of the system. MSR has not been applied to model-driven development or model-driven engineering (MDE). In particular, model management deserve particular attention. Model management covers challenges associated with “maintaining traceability links among model elements to support model evolution and roundtrip engineering”, “tracking versions”, and “using models during runtime”. These problems can be addressed by investigating the models themselves and their relationship to other artifacts using MSR. The objective of this report is to survey state-of-the-art research in MSR and to discuss how these MSR techniques are applicable to the problems faced in MDE. Extracting information about what factors affect model quality, how people interact with models in the repository, and traceability to other artifacts advance our understanding of software engineering when MDE is used.

[1]  Daniela E. Damian,et al.  Predicting build failures using social network analysis on developer communication , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[2]  Premkumar T. Devanbu,et al.  Latent social structure in open source projects , 2008, SIGSOFT '08/FSE-16.

[3]  Andreas Zeller,et al.  Predicting faults from cached history , 2008, ISEC '08.

[4]  AntoniolGiuliano,et al.  Recovering Traceability Links between Code and Documentation , 2002 .

[5]  Wallace J. Hopp,et al.  The Impact of Misalignment of Organizational Structure and Product Architecture on Quality in Complex Product Development , 2010, Manag. Sci..

[6]  Kevin Crowston,et al.  Social dynamics of free and open source team communications , 2006, OSS.

[7]  Ahmed E. Hassan,et al.  Studying the impact of dependency network measures on software quality , 2010, 2010 IEEE International Conference on Software Maintenance.

[8]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[9]  DamianDaniela,et al.  Does Socio-Technical Congruence Have an Effect on Software Build Success? A Study of Coordination in a Software Project , 2011 .

[10]  Bernhard Rumpe,et al.  Model-driven Development of Complex Software : A Research Roadmap , 2007 .

[11]  Brendan Murphy,et al.  Can developer-module networks predict failures? , 2008, SIGSOFT '08/FSE-16.

[12]  Daniela E. Damian,et al.  Does Socio-Technical Congruence Have an Effect on Software Build Success? A Study of Coordination in a Software Project , 2011, IEEE Transactions on Software Engineering.

[13]  Alberto Bacchelli,et al.  Miler: a toolset for exploring email data , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[14]  Harald C. Gall,et al.  Cross-project defect prediction: a large scale experiment on data vs. domain vs. process , 2009, ESEC/SIGSOFT FSE.

[15]  James D. Herbsleb,et al.  Identification of coordination requirements: implications for the Design of collaboration and awareness tools , 2006, CSCW '06.

[16]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[17]  Harald C. Gall,et al.  Does distributed development affect software quality? An empirical case study of Windows Vista , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[18]  Martin P. Robillard,et al.  Creating and evolving developer documentation: understanding the decisions of open source contributors , 2010, FSE '10.

[19]  Daniela E. Damian,et al.  The hidden experts in software-engineering communication: NIER track , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[20]  Laurie A. Williams,et al.  Strengthening the empirical analysis of the relationship between Linus' Law and software security , 2010, ESEM '10.

[21]  Daniela E. Damian,et al.  Global Software Development and Delay: Does Distance Still Matter? , 2008, 2008 IEEE International Conference on Global Software Engineering.

[22]  Michael W. Godfrey,et al.  Cloning by accident: an empirical study of source code cloning across software systems , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[23]  Ahmed E. Hassan,et al.  Using Decision Trees to Predict the Certification Result of a Build , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[24]  Andreas Zeller,et al.  Mining metrics to predict component failures , 2006, ICSE.

[25]  Daniela E. Damian,et al.  Mining Task-Based Social Networks to Explore Collaboration in Software Teams , 2009, IEEE Software.

[26]  Ken-ichi Matsumoto,et al.  Accelerating cross-project knowledge collaboration using collaborative filtering and social networks , 2005, MSR.

[27]  Laurie A. Williams,et al.  Predicting failures with developer networks and social network analysis , 2008, SIGSOFT '08/FSE-16.

[28]  A. Zeller,et al.  Predicting Defects for Eclipse , 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007).

[29]  N. Nagappan,et al.  Use of relative code churn measures to predict system defect density , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[30]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[31]  Harald C. Gall,et al.  Detection of logical coupling based on product release history , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[32]  Elmar Jürgens Research in cloning beyond code: a first roadmap , 2011, IWSC '11.

[33]  Harald C. Gall,et al.  Putting It All Together: Using Socio-technical Networks to Predict Failures , 2009, 2009 20th International Symposium on Software Reliability Engineering.

[34]  Thomas Zimmermann,et al.  What Makes a Good Bug Report? , 2008, IEEE Transactions on Software Engineering.

[35]  James D. Herbsleb,et al.  Factors leading to integration failures in global feature-oriented development: an empirical analysis , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[36]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[37]  Audris Mockus,et al.  An Empirical Study of Speed and Communication in Globally Distributed Software Development , 2003, IEEE Trans. Software Eng..

[38]  Michael W. Godfrey,et al.  Software process recovery using Recovered Unified Process Views , 2010, 2010 IEEE International Conference on Software Maintenance.

[39]  Gina Venolia,et al.  The secret life of bugs: Going past the errors and omissions in software repositories , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[40]  Jonathan I. Maletic,et al.  Journal of Software Maintenance and Evolution: Research and Practice Survey a Survey and Taxonomy of Approaches for Mining Software Repositories in the Context of Software Evolution , 2022 .

[41]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[42]  Maurice H. Halstead,et al.  Elements of software science (Operating and programming systems series) , 1977 .

[43]  Michael W. Godfrey,et al.  What's hot and what's not: Windowed developer topic analysis , 2009, 2009 IEEE International Conference on Software Maintenance.

[44]  Giuseppe Scanniello,et al.  Clustering Support for Static Concept Location in Source Code , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[45]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[46]  Richard C. Holt,et al.  The top ten list: dynamic fault prediction , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[47]  Ahmed E. Hassan,et al.  Understanding the impact of code and process metrics on post-release defects: a case study on the Eclipse project , 2010, ESEM '10.

[48]  Audris Mockus,et al.  Software Dependencies, Work Dependencies, and Their Impact on Failures , 2009, IEEE Transactions on Software Engineering.

[49]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[50]  Jesús M. González-Barahona,et al.  Mining large software compilations over time: another perspective of software evolution , 2006, MSR '06.

[51]  Yuefeng Zhang,et al.  Mining software repositories for model-driven development , 2006, IEEE Software.

[52]  Bernhard Schätz,et al.  Clone detection in automotive model-based development , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[53]  Andrea De Lucia,et al.  Improving Comprehensibility of Source Code via Traceability Information: a Controlled Experiment , 2006, 14th IEEE International Conference on Program Comprehension (ICPC'06).

[54]  Bran Selic,et al.  The Pragmatics of Model-Driven Development , 2003, IEEE Softw..

[55]  Michael W. Godfrey,et al.  “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[56]  Harald C. Gall,et al.  Mining Software Evolution to Predict Refactoring , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[57]  Emily Hill,et al.  Mining source code to automatically split identifiers for software analysis , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[58]  Xiaoyan Zhu,et al.  An empirical analysis of the FixCache algorithm , 2011, MSR '11.

[59]  Michael Gertz,et al.  Mining email social networks in Postgres , 2006, MSR '06.

[60]  Premkumar T. Devanbu,et al.  A simpler model of software readability , 2011, MSR '11.

[61]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[62]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[63]  Michael W. Godfrey,et al.  Automated topic naming to support cross-project analysis of software maintenance activities , 2011, MSR '11.

[64]  Yann-Gaël Guéhéneuc,et al.  Physical and conceptual identifier dispersion: Measures and relation to fault proneness , 2010, 2010 IEEE International Conference on Software Maintenance.

[65]  Victor R. Basili,et al.  The influence of organizational structure on software quality , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[66]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.