A Survey of Open Source Data Mining Systems

Open source data mining software represents a new trend in data mining research, education and industrial applications, especially in small and medium enterprises (SMEs). With open source software an enterprise can easily initiate a data mining project using the most current technology. Often the software is available at no cost, allowing the enterprise to instead focus on ensuring their staff can freely learn the data mining techniques and methods. Open source ensures that staff can understand exactly how the algorithms work by examining the source codes, if they so desire, and can also fine tune the algorithms to suit the specific purposes of the enterprise. However, diversity, instability, scalability and poor documentation can be major concerns in using open source data mining systems. In this paper, we survey open source data mining systems currently available on the Internet. We compare 12 open source systems against several aspects such as general characteristics, data source accessibility, data mining functionality, and usability. We discuss advantages and disadvantages of these open source data mining systems.

[1]  Padhraic Smyth,et al.  Business applications of data mining , 2002, CACM.

[2]  Catherine Bounsaythip,et al.  Overview of Data Mining for Customer Behavior Modeling , 2001 .

[3]  Charly Kleissner Data mining for the enterprise , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[4]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[5]  Syed Riaz Ahmed,et al.  Applications of data mining in retail business , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[6]  Gediminas Adomavicius,et al.  Using Data Mining Methods to Build Customer Profiles , 2001, Computer.

[7]  Boris Kovalerchuk,et al.  Data mining in finance: advances in relational and hybrid methods , 2000 .

[8]  Vipin Kumar,et al.  Emerging scientific applications in data mining , 2002, CACM.

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Le Gruenwald,et al.  A survey of data mining and knowledge discovery software tools , 1999, SKDD.

[11]  Charles X. Ling,et al.  Data Mining for Direct Marketing: Problems and Solutions , 1998, KDD.

[12]  Chen Wang,et al.  Open Source Software Adoption: A Status Report , 2001, IEEE Softw..

[13]  David C. Yen,et al.  Data mining techniques for customer relationship management , 2002 .

[14]  Boris Kovalerchuk,et al.  Data mining in finance , 2000 .

[15]  Robert L. Grossman,et al.  Data Mining for Scientific and Engineering Applications , 2001, Massive Computing.