Similarities, challenges and opportunities of Wikipedia content and open source projects

Several years of research and evidence have demonstrated that open source software portals often contain a large amount of software projects that simply do not evolve, developed by relatively small communities, struggling to attract a sustained number of contributors. These portals have started to increasingly act as a storage for abandoned projects, and researchers and practitioners should try and point out how to take advantage of such content. Similarly, other online content portals (like Wikipedia) could be harvested for valuable content. In this paper we argue that, even with differences in the requested expertise, many projects reliant on content and contributions by users undergo a similar evolution, and follow similar patterns: when a project fails to attract contributors, it appears to be not evolving, or abandoned. Far from a negative finding, even those projects could provide valuable content that should be harvested and identified based on common characteristics: by using the attributes of ‘usefulness’ and ‘modularity’ we isolate valuable content in both Wikipedia pages and open source software projects. Copyright © 2012 John Wiley & Sons, Ltd.

[1]  Dror G. Feitelson,et al.  Success of Open Source Projects: Patterns of Downloads and Releases with Time , 2007, IEEE International Conference on Software-Science, Technology & Engineering (SwSTE'07).

[2]  SucciGiancarlo,et al.  An Empirical Study of Open-Source and Closed-Source Software Products , 2004 .

[3]  Maurizio Morisio,et al.  Evidences in the evolution of OS projects through Changelog Analyses , 2003 .

[4]  Maurizio Morisio,et al.  Characteristics of open source projects , 2003, Seventh European Conference onSoftware Maintenance and Reengineering, 2003. Proceedings..

[5]  Martin Michlmayr,et al.  From the Cathedral to the Bazaar: An Empirical Study of the Lifecycle of Volunteer Community Projects , 2007, OSS.

[6]  Cornelia Boldyreff,et al.  Exploring the Role of Commercial Stakeholders in Open Source Software Evolution , 2012, OSS.

[7]  Graham Vickery,et al.  Participative Web And User-Created Content: Web 2.0 Wikis and Social Networking , 2007 .

[8]  José van Dijck,et al.  Users like you? Theorizing agency in user-generated content , 2009 .

[9]  Jesús M. González-Barahona,et al.  Comparison between SLOCs and number of files as size metrics for software evolution analysis , 2006, Conference on Software Maintenance and Reengineering (CSMR'06).

[10]  Ortega Soto,et al.  Wikipedia: A quantitative analysis , 2012 .

[11]  Peng Qi,et al.  The Evolution of Wikipedia , 2013 .

[12]  W. Marsden I and J , 2012 .

[13]  Chiara Francalanci,et al.  A Survey on Firms' Participation in Open Source Community Projects , 2009, OSS.

[14]  Jr. Frederick P. Brooks,et al.  The mythical man-month (anniversary ed.) , 1995 .

[15]  Patrick Weber,et al.  OpenStreetMap: User-Generated Street Maps , 2008, IEEE Pervasive Computing.

[16]  Chiara Francalanci,et al.  An Empirical Study on the Relationship among Software Design Quality , Development Effort , and Governance in Open Source Projects , 2008 .

[17]  Even-André Karlsson,et al.  Software reuse: a holistic approach , 1995 .

[18]  Iulian Neamtiu,et al.  Towards a better understanding of software evolution: An empirical study on open source software , 2009, 2009 IEEE International Conference on Software Maintenance.

[19]  Glenford J. Myers,et al.  Structured Design , 1999, IBM Syst. J..

[20]  Dennis N. Ocholla,et al.  Proceedings of ISSI 2007 - 11th International Conference of the International Society for Scientometrics and Informetrics , 2005 .

[21]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[22]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[23]  Kevin Crowston,et al.  Heartbeat: Measuring Active User Base and Potential User Interest in FLOSS Projects , 2009, OSS.

[24]  J. R.,et al.  Quantitative analysis , 1892, Nature.

[25]  Maurizio Morisio,et al.  Characterizing the Open Source Software Process: a horizontal study , 2003 .

[26]  Mark Stone,et al.  Open sources 2.0 , 2005 .

[27]  Giancarlo Succi,et al.  An empirical study of open-source and closed-source software products , 2004, IEEE Transactions on Software Engineering.

[28]  Giuseppe De Ruvo,et al.  User generated (web) content: trash or treasure , 2011, IWPSE-EVOL '11.

[29]  Ed H. Chi,et al.  The singularity is not near: slowing growth of Wikipedia , 2009, Int. Sym. Wikis.

[30]  Stephen R. Schach,et al.  An empirically-based criterion for determining the success of an open-source project , 2006, Australian Software Engineering Conference (ASWEC'06).

[31]  Jeffrey S. Poulin,et al.  Measuring software reusability , 1994, Proceedings of 1994 3rd International Conference on Software Reuse.

[32]  Dawid Weiss A Large Crawl and Quantitative Analysis of Open Source Projects Hosted on SourceForge , 2005 .

[33]  Kevin Crowston,et al.  FLOSSmole: A Collaborative Repository for FLOSS Research Data and Analyses , 2006, Int. J. Inf. Technol. Web Eng..

[34]  J. Voß Measuring Wikipedia , 2005 .

[35]  Dirk Riehle,et al.  The Total Growth of Open Source , 2008, OSS.

[36]  John Riedl,et al.  The past, present, and future of Wikipedia , 2011, Computer.

[37]  Caroline Haythornthwaite,et al.  Crowds and Communities: Light and Heavyweight Models of Peer Production , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[38]  Stefan Koch,et al.  Software evolution in open source projects - a large-scale investigation , 2007, J. Softw. Maintenance Res. Pract..

[39]  Michel Wermelinger,et al.  Empirical Studies of Open Source Evolution , 2008, Software Evolution.

[40]  Michael W. Godfrey,et al.  Evolution in open source software: a case study , 2000, Proceedings 2000 International Conference on Software Maintenance.

[41]  Ict Access,et al.  Working Party on the Information Economy , 2007 .

[42]  R. English,et al.  Identifying Success and Tragedy of FLOSS Commons: A Preliminary Classification of Sourceforge.net Projects , 2007, First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS'07: ICSE Workshops 2007).

[43]  Kevin Crowston,et al.  The Perils and Pitfalls of Mining SourceForge , 2004, MSR.