Tools in Mining Software Repositories

Mining software repositories (MSR) is an important area of research. An international workshop on MSR has been established under the umbrella of international conference on software engineering (ICSE) in year 2004. The quality papers received and presented in the workshop has led to initiate full-fledged conference which purely focuses on issues related to mining software engineering data since 2007. This paper is the result of reviewing all the papers published in the proceedings of the conferences on Mining Software Repositories (MSR) and in other related conference/journals. We have analyzed the papers that contained experimental analysis of software projects related to data mining in software engineering. We have identified the data sets, techniques and tools used/ developed/ proposed in these papers. More than half of the papers are involved in the task accomplished by building or using the data mining tools to mine the software engineering data. It is apparent from the results obtained by analyzing these papers that MSR authors process the raw data which in general publicly available. We categorizes different tools used in MSR on the basis of newly developed, traditional data mining tools, prototype developed and scripts. We have shown the type of mining task that has been performed by using these tools along with the datasets used in these studies.

[1]  Yuanyuan Zhou,et al.  CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code , 2004, OSDI.

[2]  Ahmed E. Hassan,et al.  What Can OSS Mailing Lists Tell Us? A Preliminary Psychometric Text Analysis of the Apache Developer Mailing List , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[3]  Sushil Krishna Bajracharya,et al.  Automated dependency resolution for open source software , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[4]  Carlo Ghezzi,et al.  An empirical investigation into a large-scale Java open source code repository , 2010, ESEM '10.

[5]  Thomas Williams,et al.  Gnuplot 4.4: an interactive plotting program , 2010 .

[6]  Jonathan I. Maletic,et al.  Mining sequences of changed-files from version histories , 2006, MSR '06.

[7]  Michael Burch,et al.  Visual Data Mining in Software Archives to Detect How Developers Work Together , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[8]  Katsuhisa Maruyama,et al.  A change-aware development environment by recording editing operations of source code , 2008, MSR '08.

[9]  Gail C. Murphy,et al.  Supporting software history exploration , 2011, MSR '11.

[10]  Charles D. Knutson,et al.  Author entropy vs. file size in the gnome suite of applications , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[11]  Sushil Krishna Bajracharya,et al.  Mining Eclipse Developer Contributions via Author-Topic Models , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[12]  Punam Bedi,et al.  Predicting the priority of a reported bug using machine learning techniques and cross project validation , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[13]  Yann-Gaël Guéhéneuc,et al.  An exploratory study of identifier renamings , 2011, MSR '11.

[14]  Akito Monden,et al.  Defect Data Analysis Based on Extended Association Rule Mining , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[15]  Harald C. Gall,et al.  Software evolution: analysis and visualization , 2006, ICSE '06.

[16]  K. K. Chaturvedi,et al.  Entropy based bug prediction using support vector regression , 2012, 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA).

[17]  Olga Baysal,et al.  Correlating Social Interactions to Release History during Software Evolution , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[18]  Michael D. Ernst,et al.  Prioritizing Warning Categories by Analyzing Software History , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[19]  Bashar Nuseibeh,et al.  Evaluating the Harmfulness of Cloning: A Change Based Experiment , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[20]  Audris Mockus Amassing and indexing a large sample of version control systems: Towards the census of public source code history , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[21]  Jeffrey Heer,et al.  Visualizing collaboration and influence in the open-source software community , 2011, MSR '11.

[22]  Ahmed E. Hassan,et al.  Using Pig as a data preparation language for large-scale mining software repositories studies: An experience report , 2012, J. Syst. Softw..

[23]  Daniel M. Germán,et al.  The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[24]  Lucas D. Panjer Predicting Eclipse Bug Lifetimes , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[25]  K. K. Chaturvedi,et al.  Determining Bug severity using machine learning techniques , 2012, 2012 CSI Sixth International Conference on Software Engineering (CONSEG).

[26]  Harald C. Gall,et al.  Tracking concept drift of software projects using defect prediction quality , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[27]  Sushil Krishna Bajracharya,et al.  Mining search topics from a code search engine usage log , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[28]  Andreas Zeller,et al.  How Long Will It Take to Fix This Bug? , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[29]  Ahmed E. Hassan,et al.  Think locally, act globally: Improving defect and effort prediction models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[30]  Mark Grechanik,et al.  Finding Relevant Applications for Prototyping , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[31]  Tao Xie,et al.  Identifying security bug reports via text mining: An industrial case study , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[32]  Michele Lanza,et al.  Mining the history of synchronous changes to refine code ownership , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[33]  Harald C. Gall,et al.  Comparing fine-grained source code changes and code churn for bug prediction , 2011, MSR '11.

[34]  Chanchal Kumar Roy,et al.  Bug introducing changes: A case study with Android , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[35]  David Hovemeyer,et al.  Software repository mining with Marmoset: an automated programming project snapshot and testing system , 2005, ACM SIGSOFT Softw. Eng. Notes.

[36]  Leon Moonen,et al.  Evaluating the relation between coding standard violations and faultswithin and across software versions , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[37]  Akito Monden,et al.  Revisiting common bug prediction findings using effort-aware models , 2010, 2010 IEEE International Conference on Software Maintenance.

[38]  Elaine J. Weyuker,et al.  Does calling structure information improve the accuracy of fault prediction? , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[39]  Yuanyuan Zhou Connecting technology with real-world problems - from copy-paste detection to detecting known bugs (keynote abstract) , 2011, MSR '11.

[40]  Christina von Flach G. Chavez,et al.  Characterizing verification of bug fixes in two open source IDEs , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[41]  Scott Henninger,et al.  Supporting the construction and evolution of component repositories , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[42]  Michael W. Godfrey,et al.  Software bertillonage: finding the provenance of an entity , 2011, MSR '11.

[43]  Sunghun Kim,et al.  The evolution of data races , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[44]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[45]  Tao Xie,et al.  SpotWeb: Detecting Framework Hotspots and Coldspots via Mining Open Source Code on the Web , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[46]  Karl Trygve Kalleberg,et al.  Finding software license violations through binary code clone detection , 2011, MSR '11.

[47]  Christoph Treude,et al.  A comparative exploration of FreeBSD bug lifetimes , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[48]  Andreas Zeller,et al.  Mining the Jazz repository: Challenges and opportunities , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[49]  William Pugh,et al.  Learning from defect removals , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[50]  Ahmed E. Hassan,et al.  Modeling the evolution of topics in source code histories , 2011, MSR '11.

[51]  Jonathan I. Maletic,et al.  Comparing Approaches to Mining Source Code for Call-Usage Patterns , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[52]  Premkumar T. Devanbu,et al.  Detecting Patch Submission and Acceptance in OSS Projects , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[53]  Ahmed E. Hassan,et al.  MapReduce as a general framework to support research in Mining Software Repositories (MSR) , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[54]  Saurabh Sinha,et al.  Entering the circle of trust: developer initiation as committers in open-source projects , 2011, MSR '11.

[55]  Mladen A. Vouk,et al.  On mining data across software repositories , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[56]  Daniel M. Germán,et al.  What do large commits tell us?: a taxonomical study of large commits , 2008, MSR '08.

[57]  Romain Robbes Mining a Change-Based Software Repository , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[58]  Premkumar T. Devanbu,et al.  Validity of network analyses in Open Source Projects , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[59]  Osamu Mizuno,et al.  Spam Filter Based Approach for Finding Fault-Prone Software Modules , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[60]  Georgios Gousios,et al.  GHTorrent: Github's data from a firehose , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[61]  Thomas Grechenig,et al.  Mining security changes in FreeBSD , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[62]  Ahmed E. Hassan,et al.  Explaining software defects using topic models , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[63]  Georgios Gousios,et al.  A platform for software engineering research , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[64]  Andrew Begel,et al.  Deep intellisense: a tool for rehydrating evaporated information , 2008, MSR '08.

[65]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[66]  Gerhard Fischer,et al.  Supporting reuse by delivering task-relevant and personalized information , 2002, ICSE '02.

[67]  Brian D. Ripley,et al.  The R Project in Statistical Computing , 2001 .

[68]  Harald C. Gall,et al.  Can we predict types of code changes? An empirical analysis , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[69]  Michael W. Godfrey,et al.  Automated topic naming to support cross-project analysis of software maintenance activities , 2011, MSR '11.

[70]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[71]  Harald C. Gall,et al.  Change Analysis with Evolizer and ChangeDistiller , 2009, IEEE Software.

[72]  Iman Keivanloo,et al.  A Linked Data platform for mining software repositories , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[73]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[74]  Adrian Kuhn,et al.  Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[75]  Adam Vanya,et al.  Supporting Architecture Evolution by Mining Software Repositories , 2012 .

[76]  Sushil Krishna Bajracharya,et al.  SourcererDB: An aggregated repository of statically analyzed and cross-linked open source Java projects , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[77]  Yue Jia,et al.  Cloning and copying between GNOME projects , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[78]  Oscar Nierstrasz,et al.  Assigning bug reports using a vocabulary-based expertise model of developers , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[79]  Bastin Tony Roy Savarimuthu,et al.  Towards Mining Norms in Open Source Software Repositories , 2013, ADMI.

[80]  Jesús M. González-Barahona,et al.  Impact of the Creation of the Mozilla Foundation in the Activity of Developers , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[81]  Michel R. V. Chaudron,et al.  Assessing UML design metrics for predicting fault-prone classes in a Java system , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[82]  Foutse Khomh,et al.  Do faster releases improve software quality? An empirical case study of Mozilla Firefox , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[83]  Robert J. Walker,et al.  A newbie's guide to eclipse APIs , 2008, MSR '08.

[84]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[85]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[86]  Michael W. Godfrey,et al.  Release Pattern Discovery via Partitioning: Methodology and Case Study , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[87]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[88]  Bart Goethals,et al.  Predicting the severity of a reported bug , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[89]  Akito Monden,et al.  Mining software repositories , 2013 .

[90]  Gregorio Robles,et al.  Remote analysis and measurement of libre software systems by means of the CVSAnalY tool , 2004, ICSE 2004.

[91]  Abraham Bernstein,et al.  Mining Software Repositories with iSPAROL and a Software Evolution Ontology , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).