Analysis of Activity in the Open Source Software Development Community

Open source software is computer software for which the source code is publicly open for inspection, modification, and redistribution. While research of a few, large, successful projects have provided insights into the nature and practices of the open source software community; it still leaves open the question about the thousands of other open source projects which are neither large or highly successful. In this paper, we describe a data set of SourceForge.net, the world's largest open source software development site, which is available for research purposes; we discuss various data mining techniques that can be applied to the data and the type of research questions that can be answered. We apply a few of these techniques and provide analysis of the results

[1]  Alexander L. Wolf,et al.  Discovering models of software processes from event-based data , 1998, TSEM.

[2]  Alexander Hars,et al.  Working for free? Motivations of participating in open source projects , 2001, Proceedings of the 34th Annual Hawaii International Conference on System Sciences.

[3]  Gregorio Robles,et al.  Remote analysis and measurement of libre software systems by means of the CVSAnalY tool , 2004, ICSE 2004.

[4]  Walt Scacchi,et al.  Collaboration, Leadership, Control, and Conflict Negotiation and the Netbeans.org Open Source Software Development Community , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[5]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[6]  Jin Xu,et al.  A Topological Analysis of the Open Souce Software Development Community , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[7]  Chih-Ping Wei,et al.  Discovery of temporal patterns from process instances , 2004, Comput. Ind..

[8]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[9]  Dimitrios Gunopulos,et al.  Mining Process Models from Workflow Logs , 1998, EDBT.

[10]  Wil M. P. van der Aalst,et al.  Process mining: a research agenda , 2004, Comput. Ind..

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[13]  Kevin Crowston,et al.  Core and Periphery in Free/Libre and Open Source Software Team Communications , 2006, Proceedings of the 39th Annual Hawaii International Conference on System Sciences (HICSS'06).

[14]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[15]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[16]  Joachim Krauth,et al.  Distribution-Free Statistics: An Application-Oriented Approach , 1988 .