Mining Open Source Software data using regular expressions

The Open Source Software (OSS) management has attracted considerable attention in the last few years. Project management for effective software process improvement must be achieved based on quantitative data. However, because data collection for measurement requires high costs and collaboration with developers, and data dumps may require a huge effort to understand schemas and tables. It is difficult to collect coherent, quantitative data continuously and to utilize the data for practicing software process improvement. In this paper, we report our results of mining data acquired from SourceForge.net, the largest open source software hosting website. In the process we describe Mailing list Crawler (MC) which automatically collects Mailing lists repositories in widely used software development support systems. Providing integrated measurement results graphically, MC can help developers/managers keep projects under control in real time.

[1]  Gail C. Murphy,et al.  Predicting source code changes by mining change history , 2004, IEEE Transactions on Software Engineering.

[2]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[3]  Daniel M. German Using software trails to reconstruct the evolution of software: Research Articles , 2004 .

[4]  Olga Baysal,et al.  Correlating Social Interactions to Release History during Software Evolution , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[5]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[6]  Daniel M. German,et al.  Open source software peer review practices , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[7]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[8]  Annie T. T. Ying,et al.  Predicting source code changes by mining revision history , 2003 .

[9]  Daniel M. Germán,et al.  Using software trails to reconstruct the evolution of software , 2004, J. Softw. Maintenance Res. Pract..

[10]  Lucian Voinea,et al.  CVSgrab: Mining the History of Large Software Projects , 2006, EuroVis.

[11]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.