Mining Open Source Software (OSS) Data Using Association Rules Network

The Open Source Software(OSS) movement has attracted considerable attention in the last few years. In this paper we report our results of mining data acquired from SourceForge.net, the largest open source software hosting website. In the process we introduce Association Rules Network(ARN), a (hyper)graphical model to represent a special class of association rules. Using ARNs we discover important relationships between the attributes of successful OSS projects. We verify and validate these relationships using Factor Analysis, a classical statistical technique related to Singular Value Decomposition(SVD).

[1]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and Open Source by an accidental revolutionary , 2001 .

[2]  Hongjun Lu,et al.  A template model for multidimensional inter-transactional association rules , 2002, The VLDB Journal.

[3]  Finn V. Jensen,et al.  Bayesian Networks and Decision Graphs , 2001, Statistics for Engineering and Information Science.

[4]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: application in VLSI domain , 1997, DAC.

[5]  Robert L. Glass,et al.  Y2K and Other Software Noncrises , 2000 .

[6]  Robert L. Glass,et al.  Loyal Opposition - The Sociology of Open Source: Of Cults and Cultures , 2000, IEEE Softw..

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: applications in VLSI domain , 1999, IEEE Trans. Very Large Scale Integr. Syst..

[9]  Bernd Brügge,et al.  Communication Metrics for Software Development , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[10]  Simon Parsons,et al.  Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-X , 2004, The Knowledge Engineering Review.

[11]  Eric S. Raymond,et al.  The cathedral and the bazaar - musings on Linux and open source by an accidental revoltionary (rev. ed.) , 2001 .

[12]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[13]  Linus Torvalds,et al.  The Linux edge , 1999, CACM.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Vipin Kumar,et al.  Clustering Based On Association Rule Hypergraphs , 1997, DMKD.

[16]  Robert L. Glass The naturalness of object orientation: beating a dead horse? , 2002, IEEE Software.

[17]  Jaideep Srivastava,et al.  Data Preparation for Mining World Wide Web Browsing Patterns , 1999, Knowledge and Information Systems.

[18]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.