EUROPEAN CONFERENCE ON MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES

Knowledge Discovery in Databases (KDD) has grown a lot during the last years. But providing user support for constructing workflows is still problematic. The large number of operators available in current KDD systems makes it difficult for a user to successfully solve her task. Also, workflows can easily reach a huge number of operators (hundreds) and parts of the workflows are applied several times. Therefore, it becomes hard for the user to construct them manually. In addition, workflows are not checked for correctness before execution. Hence, it frequently happens that the execution of the workflow stops with an error after several hours runtime. In this paper we present a solution to these problems. We introduce a knowledge-based representation of Data Mining (DM) workflows as a basis for cooperative-interactive planning. Moreover, we discuss workflow templates, i.e. abstract workflows that can mix executable operators and tasks to be refined later into sub-workflows. This new representation helps users to structure and handle workflows, as it constrains the number of operators that need to be considered. Finally, workflows can be grouped in templates which foster re-use further simplifying DM workflow construction.

[1]  Tobias Scheffer,et al.  Finding association rules that trade support optimally against confidence , 2001, Intell. Data Anal..

[2]  Joaquin Vanschoren,et al.  Collaborative meta-learning , 2010, ECAI 2010.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[5]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[6]  Johannes Fürnkranz,et al.  An Empirical Comparison of Probability Estimation Techniques for Probabilistic Rules , 2009, Discovery Science.

[7]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[8]  M. Hilario,et al.  A Data Mining Ontology for Algorithm Selection and Meta-Mining , 2009 .

[9]  Balakrishnan Chandrasekaran,et al.  What are ontologies, and why do we need them? , 1999, IEEE Intell. Syst..

[10]  N. F. Noy,et al.  Ontology Development 101: A Guide to Creating Your First Ontology , 2001 .

[11]  Frans Coenen,et al.  Obtaining best parameter values for accurate classification , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[15]  Mario Cannataro,et al.  A Data Mining Ontology for Grid Programming , 2003 .

[16]  Dimitris Apostolou,et al.  Consensus Building in Collaborative Ontology Engineering Processes , 2006 .

[17]  R. Kuehl Design of Experiments: Statistical Principles of Research Design and Analysis , 1999 .

[18]  Bijan Parsia,et al.  SPARQL-DL: SPARQL Query for OWL-DL , 2007, OWLED.

[19]  Johannes Fürnkranz,et al.  On Meta-Learning Rule Learning Heuristics , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[20]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[21]  Alexander S. Szalay,et al.  The world-wide telescope , 2001, CACM.

[22]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[23]  Hendrik Blockeel,et al.  Experiment Databases , 2007, Inductive Databases and Constraint-Based Data Mining.

[24]  Abraham Bernstein,et al.  Towards cooperative planning of data mining workflows , 2009 .

[25]  Carole A. Goble,et al.  e-Science and the Semantic Web: A Symbiotic Relationship , 2006, ALT.

[26]  Walter Daelemans,et al.  Comparing Learning Approaches to Coreference Resolution. There is More to it Than 'Bias' , 2005, ICML 2005.

[27]  Saso Dzeroski,et al.  Towards an Ontology of Data Mining Investigations , 2009, Discovery Science.

[28]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[29]  Geoff Holmes,et al.  Learning from the Past with Experiment Databases , 2008, PRICAI.

[30]  Hendrik Blockeel,et al.  Experiment Databases: Creating a New Platform for Meta-Learning Research , 2008, ICML 2008.

[31]  Takahira Yamaguchi,et al.  Evaluation of Rule Interestingness Measures with a Clinical Dataset on Hepatitis , 2004, PKDD.

[32]  Stefan Rüping,et al.  On Reusing Data Mining in Business Processes - A Pattern-Based Approach , 2010, Business Process Management Workshops.

[33]  Joost N. Kok,et al.  Frequent subgraph miners: runtimes don't say everything , 2006 .

[34]  Jeffrey S. Simonoff,et al.  Tree Induction Vs Logistic Regression: A Learning Curve Analysis , 2001, J. Mach. Learn. Res..

[35]  Johannes Fürnkranz,et al.  An Empirical Investigation of the Trade-Off between Consistency and Coverage in Rule Learning Heuristics , 2008, Discovery Science.

[36]  Ross D King,et al.  An ontology of scientific experiments , 2006, Journal of The Royal Society Interface.

[37]  Stefan Mutter,et al.  Using Classification to Evaluate the Output of Confidence-Based Association Rule Mining , 2004, Australian Conference on Artificial Intelligence.

[38]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.

[39]  Johannes Fürnkranz,et al.  A Re-evaluation of the Over-Searching Phenomenon in Inductive Rule Learning , 2008, LWA.

[40]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[41]  Matthias Klusch,et al.  Semantic Web Service Composition Planning with OWLS-Xplan , 2005, AAAI Fall Symposium: Agents and the Semantic Web.

[42]  David W. Aha,et al.  Generalizing from Case studies: A Case Study , 1992, ML.