A survey of data mining and knowledge discovery software tools

Knowledge discovery in databases is a rapidly growing field, whose development is driven by strong research interests as well as urgent practical, social, and economical needs. While the last few years knowledge discovery tools have been used mainly in research environments, sophisticated software products are now rapidly emerging. In this paper, we provide an overview of common knowledge discovery tasks and approaches to solve these tasks. We propose a feature classification scheme that can be used to study knowledge and data mining software. This scheme is based on the software's general characteristics, database connectivity, and data mining characteristics. We then apply our feature classification scheme to investigate 43 software products, which are either research prototypes or commercially available. Finally, we specify features that we consider important for knowledge discovery software to possess in order to accommodate its users effectively, as well as issues that are either not addressed or insufficiently solved yet.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  J. Stutz,et al.  Autoclass — A Bayesian Approach to Classification , 1996 .

[3]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[4]  Stefan Wrobel,et al.  Extensibility in Data Mining Systems , 1996, KDD.

[5]  Ron Kohavi,et al.  Data Mining Using MLC a Machine Learning Library in C++ , 1996, Int. J. Artif. Intell. Tools.

[6]  Oren Etzioni,et al.  Learning Decision Lists Using Homogeneous Rules , 1994, AAAI.

[7]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[8]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[11]  Ron Kohavi,et al.  MineSet: An Integrated System for Data Mining , 1997, KDD.

[12]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[13]  Usama M. Fayyad,et al.  Data Mining and Knowledge Discovery: Making Sense Out of Data , 1996, IEEE Expert.

[14]  Andrzej Skowron,et al.  Chapter 19 the Design and Implementation of a Knowledge Discovery Toolkit Based on Rough Sets { the Rosetta System , 1998 .

[15]  Evangelos Simoudis,et al.  Reality Check for Data Mining , 1996, IEEE Expert.

[16]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[17]  Luc De Raedt,et al.  The Clausal Discovery Engine User's Guide 3.0 , 1996 .

[18]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[19]  Paola Sebastiani,et al.  Learning Bayesian Networks from Incomplete Databases , 1997, UAI.

[20]  Yoram Reich,et al.  Building and improving design systems: a machine learning approach , 1991 .

[21]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[22]  Ronald J. Brachman,et al.  The Process of Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.