Process Mining Software Repositories

Software developers’ activities are in general recorded in software repositories such as version control systems, bug trackers and mail archives. While abundant information is usually present in such repositories, successful information extraction is often challenged by the necessity to simultaneously analyze different repositories and to combine the information obtained. We propose to apply process mining techniques, originally developed for business process analysis, to address this challenge. However, in order for process mining to become applicable, different software repositories should be combined, and “related” software development events should be matched: e.g., mails sent about a file, modifications of the file and bug reports that can be traced back to it. The combination and matching of events has been implemented in FRASR (Framework for Analyzing Software Repositories), augmenting the process mining framework ProM. FRASR has been successfully applied in a series of case studies addressing such aspects of the development process as roles of different developers and the way bug reports are handled.

[1]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[2]  Nachiappan Nagappan,et al.  Using Software Dependencies and Churn Metrics to Predict Field Failures: An Empirical Case Study , 2007, ESEM 2007.

[3]  Gregorio Robles,et al.  Remote analysis and measurement of libre software systems by means of the CVSAnalY tool , 2004, ICSE 2004.

[4]  Gary Klein,et al.  An exploration of the relationship between software development process maturity and project performance , 2004, Inf. Manag..

[5]  Martin Michlmayr,et al.  From the Cathedral to the Bazaar: An Empirical Study of the Lifecycle of Volunteer Community Projects , 2007, OSS.

[6]  Lucian Voinea,et al.  Mining software repositories with CVSgrab , 2006, MSR '06.

[7]  Song,et al.  Supporting proces mining by showing events at a glance , 2007 .

[8]  Boudewijn F. van Dongen,et al.  ProM 4.0: Comprehensive Support for Real Process Analysis , 2007, ICATPN.

[9]  David Hovemeyer,et al.  Software repository mining with Marmoset , 2005, MSR.

[10]  Georgios Gousios,et al.  Alitheia Core: An extensible software quality monitoring platform , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[11]  Boudewijn F. van Dongen,et al.  The ProM Framework: A New Era in Process Mining Tool Support , 2005, ICATPN.

[12]  Heng-Li Yang,et al.  Team structure and team performance in IS development: a social network perspective , 2004, Inf. Manag..

[13]  Gail C. Murphy,et al.  Project history as a group memory: learning from the past , 2005 .

[14]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[15]  Jonathan I. Maletic,et al.  Mining software repositories for traceability links , 2007, 15th IEEE International Conference on Program Comprehension (ICPC '07).

[16]  Kevin Forsberg,et al.  4.4.3 A Visual Explanation of Development Methods and Strategies including the Waterfall, Spiral, Vee, Vee+, and Vee++ Models , 2001 .

[17]  James Coplien,et al.  Social patterns in productive software development organizations , 1996, Ann. Softw. Eng..

[18]  Wil M. P. van der Aalst,et al.  Process Mining: Discovering Direct Successors in Process Logs , 2002, Discovery Science.

[19]  Wil M. P. van der Aalst,et al.  A Generic Import Framework for Process Event Logs , 2006, Business Process Management Workshops.

[20]  Wil M. P. van der Aalst,et al.  The Application of Petri Nets to Workflow Management , 1998, J. Circuits Syst. Comput..

[21]  Boudewijn F. van Dongen,et al.  Process Mining Framework for Software Processes , 2007, ICSP.

[22]  Wil M. P. van der Aalst,et al.  Process Mining Applied to the Test Process of Wafer Scanners in ASML , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Xiaomin Wu,et al.  A reverse engineering approach to support software maintenance: version control knowledge extraction , 2004, 11th Working Conference on Reverse Engineering.

[24]  J. Fairclough,et al.  The ESA software engineering standards: past, present and future , 1997, Proceedings of IEEE International Symposium on Software Engineering Standards.

[25]  Chadd C. Williams,et al.  Automatic mining of source code repositories to improve bug finding techniques , 2005, IEEE Transactions on Software Engineering.

[26]  Kevin Crowston,et al.  The social structure of free and open source software development , 2005, First Monday.

[27]  Steve Sawyer,et al.  Software development teams , 2004, CACM.

[28]  Andreas Zeller,et al.  Mining Version Histories to Guide Software Changes , 2004 .

[29]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[30]  Alexander Serebrenik,et al.  SQuAVisiT: A Flexible Tool for Visual Software Analytics , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[31]  Wil M. P. van der Aalst,et al.  Trace Clustering in Process Mining , 2008, Business Process Management Workshops.

[32]  Wil M. P. van der Aalst,et al.  Towards comprehensive support for organizational mining , 2008, Decis. Support Syst..

[33]  Boudewijn F. van Dongen,et al.  A Meta Model for Process Mining Data , 2005, EMOI-INTEROP.

[34]  Terrill L. Frantz,et al.  Transforming Raw-Email Data into Social-Network Information , 2008, ISI Workshops.

[35]  Jesús M. González-Barahona,et al.  Developer identification methods for integrated data from various sources , 2005, ACM SIGSOFT Softw. Eng. Notes.

[36]  Timothy Lethbridge,et al.  Object-oriented software engineering - practical software development using UML and Java , 2002 .

[37]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[38]  Daniel German,et al.  Mining CVS repositories, the softChange experience , 2004, MSR.

[39]  Kouichi Kishida,et al.  Evolution patterns of open-source software systems and communities , 2002, IWPSE '02.

[40]  Kevin Crowston,et al.  Defining Open Source Software Project Success , 2003, ICIS.

[41]  Boudewijn F. van Dongen,et al.  Business process mining: An industrial application , 2007, Inf. Syst..

[42]  Carl Gutwin,et al.  Mining a Software Developer's Local Interaction History , 2004, MSR.

[43]  Ahmed E. Hassan,et al.  Automated classification of change messages in open source projects , 2008, SAC '08.

[44]  Carl Gutwin,et al.  Group awareness in distributed software development , 2004, CSCW.

[45]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[46]  Wil M. P. van der Aalst,et al.  Process Mining towards Semantics , 2008, Advances in Web Semantics I.

[47]  Wil M. P. van der Aalst,et al.  Fuzzy Mining - Adaptive Process Simplification Based on Multi-perspective Metrics , 2007, BPM.

[48]  Michael Weiss,et al.  Evolution of Open Source Communities , 2006, OSS.

[49]  Audris Mockus,et al.  Future of Mining Software Archives: A Roundtable , 2009, IEEE Software.

[50]  Janice Singer,et al.  Hipikat: a project memory for software development , 2005, IEEE Transactions on Software Engineering.

[51]  Alexander Serebrenik,et al.  Theil index for aggregation of software metrics values , 2010, 2010 IEEE International Conference on Software Maintenance.

[52]  Andreas Zeller,et al.  eROSE: guiding programmers in eclipse , 2005, OOPSLA '05.

[53]  Jan Martijn E. M. van der Werf,et al.  Process Diagnostics: A Method Based on Process Mining , 2009, 2009 International Conference on Information, Process, and Knowledge Management.

[54]  Daniela E. Damian,et al.  Mining Task-Based Social Networks to Explore Collaboration in Software Teams , 2009, IEEE Software.

[55]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.