Unboxing Data Mining Via Decomposition in Operators - Towards Macro Optimization and Distribution

Data mining deals with finding hidden knowledge patterns in often huge data sets. The work presented in this paper elaborates on defining data mining tasks in terms of fine-grained composable operators instead of coarse-grained black box algorithms. Data mining tasks in the knowledge discovery process typically need one relational table as input and data preprocessing and integration beforehand. The possible combination of different kind of operators (relational, data mining and data preprocessing operators) represents a novel holistic view on the knowledge discovery process. Initially, as described in this paper, for the low-level execution phase but yielding the potential for rich optimization similar to relational query optimization. We argue that such macro-optimization embracing the overall KDD process leads to improved performance instead of focusing on just a small part of it via micro-optimization.

[1]  Norman W. Paton,et al.  Adaptive Query Processing: A Survey , 2002, BNCOD.

[2]  Takahira Yamaguchi,et al.  Constructive Meta-learning with Machine Learning Method Repositories , 2004, International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems.

[3]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[4]  Jean-François Boulicaut,et al.  Query Languages Supporting Descriptive Rule Mining: A Comparative Study , 2004, Database Support for Data Mining Applications.

[5]  K. Sattler,et al.  Towards Data Mining Operators in Database Systems : Algebra and Implementation ? , 2002 .

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Ernestina Menasalvas Ruiz,et al.  Integrating KDD Algorithms and RDBMS Code , 1998, Rough Sets and Current Trends in Computing.

[8]  Laks V. S. Lakshmanan,et al.  The 3W Model and Algebra for Unified Data Mining , 2000, VLDB.

[9]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[10]  Juergen Hofer DIGIDT : Distributed Classifier Construction in the Grid Data Mining Framework GridMiner-Core , 2004 .

[11]  Yannis E. Ioannidis,et al.  Query optimization , 1996, CSUR.

[12]  Norman W. Paton,et al.  The design and implementation of Grid database services in OGSA‐DAI , 2005, Concurr. Pract. Exp..

[13]  Norman W. Paton,et al.  OGSA-DQP: A Service for Distributed Querying on the Grid , 2004, EDBT.

[14]  Jean-François Boulicaut,et al.  Query languages supporting descriptive rule mining , 2004 .

[15]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[16]  Abraham Bernstein,et al.  Toward intelligent assistance for a data mining process: an ontology-based approach for cost-sensitive classification , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Goetz Graefe,et al.  Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution , 1993, IEEE Trans. Software Eng..

[18]  Giuseppe Psaila,et al.  A New SQL-like Operator for Mining Association Rules , 1996, VLDB.