In this paper, we provide a preliminary evaluation of the quality and quantity of data on 50000 open source (OS) projects hosted at the SourceForge.net portal. Using several indicators of project activity, we identify one sample from the entire dataset: the 'most-broadly-active' OS projects. The number of projects that are active across all of our main indicators of activity account for less than 1% of the projects on the portal. 75% of the projects currently hosted on the SourceForge.net portal are not, and have never really been, active on the portal. Furthermore, whilst there has been a substantial increase in the number of projects being added to SourceForge.net over time, the number of projects being added that then go on to become most-broadly-active projects seems to be decreasing over time. Finally, we recognise that care needs to be taken in defining samples, such as the most-broadly-active projects, as these definitions raise implications for the conclusions that one makes and the generalisations that one should draw
[1]
Jesper Holck,et al.
Do Not Check in on Red: Control Meets Anarchy in Two Open Source Projects
,
2005
.
[2]
Kieran Healy,et al.
The Ecology of Open-Source Software Development
,
2003
.
[3]
Kevin Crowston,et al.
The Perils and Pitfalls of Mining SourceForge
,
2004,
MSR.
[4]
Jason E. Robbins.
Adopting OSS Methods by Adopting OSS Tools
,
2002
.
[5]
Sandeep Krishnamurthy,et al.
Cave or Community? An Empirical Examination of 100 Mature Open Source Projects
,
2002,
First Monday.