Tools and Datasets for Mining Libre Software Repositories

Thanks to the open nature of libre (free, open source) software projects, researchers have gained access to a rich set of data related to various aspects of software development. Although it is usually publicly available on the Internet, obtaining and analyzing the data in a convenient way is not an easy task, and many considerations have to be taken into account. In this chapter we introduce the most relevant data sources that can be found in libre software projects and that are commonly studied by scholars: source code releases, source code management systems, mailing lists and issue (bug) tracking systems. The chapter also provides some advice on the problems that can be found when retrieving and preparing the data sources for a later analysis, as well as information about the tools and datasets that support these tasks. DOI: 10.4018/978-1-60960-513-1.ch002

[1]  Ricardo Pezzuol Jacobi,et al.  DSOA: A Service Oriented Architecture for Ubiquitous Applications , 2010, Int. J. Handheld Comput. Res..

[2]  Qinbao Song,et al.  An Empirical Analysis of Software Changes on Statement Entity in Java Open Source Projects , 2012, Int. J. Open Source Softw. Process..

[3]  Gregorio Robles,et al.  Collecting data about FLOSS development: the FLOSSMetrics experience , 2010, FLOSS '10.

[4]  Audris Mockus,et al.  Identifying reasons for software changes using historic databases , 2000, Proceedings 2000 International Conference on Software Maintenance.

[5]  Ilkka Tuomi Evolution of the Linux Credits file: Methodological challenges and reference data for Open Source research , 2004, First Monday.

[6]  Thomas Zimmermann,et al.  Extracting structural information from bug reports , 2008, MSR '08.

[7]  Gregorio Robles,et al.  Evolution of Volunteer Participation in Libre Software Projects: Evidence from Debian , 2005 .

[8]  Robert Gobeille,et al.  The FOSSology project , 2008, MSR '08.

[9]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[10]  Kevin Crowston,et al.  The Perils and Pitfalls of Mining SourceForge , 2004, MSR.

[11]  Jesús M. González-Barahona,et al.  Mining large software compilations over time: another perspective of software evolution , 2006, MSR '06.

[12]  Stefano Zacchiroli,et al.  The Ultimate Debian Database: Consolidating bazaar metadata for Quality Assurance and data mining , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[13]  Audris Mockus,et al.  Using Version Control Data to Evaluate the Impact of Software Tools: A Case Study of the Version Editor , 2002, IEEE Trans. Software Eng..

[14]  Jesús M. González-Barahona,et al.  Tools for the Study of the Usual Data Sources found in Libre Software Projects , 2009, Int. J. Open Source Softw. Process..

[15]  Daniel German,et al.  Mining CVS repositories, the softChange experience , 2004, MSR.

[16]  Maurizio Morisio,et al.  Evidences in the evolution of OS projects through Changelog Analyses , 2003 .

[17]  David Hinds,et al.  Communication Network Characteristics of Open Source Communities , 2009, Int. J. Open Source Softw. Process..

[18]  Shusaku Nomura,et al.  Kansei’s Physiological Measurement and Its Application (2): Estimation of Human States Using PCA and HMM , 2011 .

[19]  Hessam S. Sarjoughian,et al.  DEVS-based simulation interoperability , 2009 .

[20]  Brian Fitzgerald,et al.  Time-Based Release Management in Free and Open Source (FOSS) Projects , 2012, Int. J. Open Source Softw. Process..

[21]  Stephan Diehl,et al.  Small patches get in! , 2008, MSR '08.

[22]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[23]  Heinrich Theodor Vierhaus,et al.  Design and Test Technology for Dependable Systems-on-Chip , 2010 .

[24]  Shahron Williams van Rooij,et al.  Higher Education and FOSS for e-Learning: The Role of Organizational Sub-cultures in Enterprise-wide Adoption , 2010, Int. J. Open Source Softw. Process..

[25]  Markus Pizka,et al.  The contribution of free software to software evolution , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[26]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[27]  Basabi Chakraborty,et al.  Kansei Engineering and Soft Computing: Theory and Practice , 2010 .

[28]  Juan Julián Merelo Guervós,et al.  Beyond source code: The importance of other artifacts in software development (a case study) , 2006, J. Syst. Softw..

[29]  Eoin Whelan,et al.  Service science: exploring complex agile service networks through organisational network analysis , 2012 .

[30]  Maurizio Morisio,et al.  Structural evolution of an open source system: a case study , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[31]  Jesús M. González-Barahona,et al.  Repositories with Public Data about Software Development , 2010, Int. J. Open Source Softw. Process..

[32]  A. Capiluppi Improving comprehension and cooperation through code structure , 2004, ICSE 2004.

[33]  Audris Mockus,et al.  Inferring change effort from configuration management databases , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[34]  Virginia M. Miori,et al.  Application of Triplet Notation and Dynamic Programming to Single-Line, Multi-Product Dairy Production Scheduling , 2010, Int. J. Bus. Intell. Res..

[35]  Yu-Wei Lin,et al.  Hacker Culture and the FLOSS Innovation , 2012, Int. J. Open Source Softw. Process..

[36]  Daniel Izquierdo-Cortazar,et al.  FLOSSMetrics: Free/Libre/Open Source Software Metrics , 2009, 2009 13th European Conference on Software Maintenance and Reengineering.

[37]  Gregorio Robles,et al.  Replicating MSR: A study of the potential replicability of papers published in the Mining Software Repositories proceedings , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[38]  Giovan Francesco Lanzara,et al.  The Knowledge Ecology of Open-Source Software Projects , 2003 .

[39]  Andreas Zeller,et al.  Mining version histories to guide software changes , 2005, Proceedings. 26th International Conference on Software Engineering.

[40]  Richard Vidgen,et al.  Agile and Lean Service-Oriented Development: Foundations, Theory, and Practice , 2012 .

[41]  Audris Mockus,et al.  Evaluation of source code copy detection methods on freebsd , 2008, MSR '08.

[42]  Abram Hindle,et al.  Mining Challenge 2010: FreeBSD, GNOME Desktop and Debian/Ubuntu , 2010, 2010 7th IEEE Working Conference on Mining Software Repositories (MSR 2010).

[43]  Michael Hahsler,et al.  Discussion of a Large-Scale Open Source Data Collection Methodology , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[44]  Gregorio Robles,et al.  Remote analysis and measurement of libre software systems by means of the CVSAnalY tool , 2004, ICSE 2004.