Database Support for Data Mining Applications

Inductive databases (IDBs) have been proposed to afford the problem of knowledge discovery from huge databases. With an IDB the user/analyst performs a set of very different operations on data using a query language, powerful enough to perform all the required elaborations, such as data preprocessing, pattern discovery and pattern postprocessing. We present a synthetic view on important concepts that have been studied within the cInQ European project when considering the pattern domain of itemsets. Mining itemsets has been proved useful not only for association rule mining but also feature construction, classification, clustering, etc. We introduce the concepts of pattern domain, evaluation functions, primitive constraints, inductive queries and solvers for itemsets. We focus on simple high-level definitions that enable to forget about technical details that the interested reader will find, among others, in cInQ publications.

[1]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[2]  Luc De Raedt,et al.  Feature Construction with Version Spaces for Biochemical Applications , 2001, ICML.

[3]  Bart Goethals,et al.  On Supporting Interactive Association Rule Mining , 2000, DaWaK.

[4]  Luc De Raedt,et al.  A Logical Database Mining Query Language , 2000, ILP.

[5]  Jaroslav Pokorný The GUHA-DBS Data Base System , 1981, Int. J. Man Mach. Stud..

[6]  H. Hirsh Theoretical Underpinnings of Version Spaces , 1991, IJCAI.

[7]  Joost N. Kok,et al.  Faster Association Rules for Multiple Relations , 2001, IJCAI.

[8]  Hendrik Blockeel,et al.  From Shell Logs to Shell Scripts , 2001, ILP.

[9]  Daniel Kifer,et al.  DualMiner: A Dual-Pruning Algorithm for Itemsets with Constraints , 2002, Data Mining and Knowledge Discovery.

[10]  Jan Rauch,et al.  Mining for 4ft Association Rules , 2000, Discovery Science.

[11]  Jean-François Boulicaut,et al.  Querying Inductive Databases: A Case Study on the MINE RULE Operator , 1998, PKDD.

[12]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[13]  Kyuseok Shim,et al.  SPIRIT: Sequential Pattern Mining with Regular Expression Constraints , 1999, VLDB.

[14]  Jan Rauch,et al.  Converting Association Rules into Natural Language - an Attempt , 2003, IIS.

[15]  François Jacquenet,et al.  Mining Frequent Logical Sequences with SPIRIT-LoG , 2002, ILP.

[16]  Luc De Raedt,et al.  The Levelwise Version Space Algorithm and its Application to Molecular Fragment Finding , 2001, IJCAI.

[17]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[18]  Elena Baralis,et al.  Incremental Refinement of Mining Queries , 1999, DaWaK.

[19]  Cheikh Talibouya Diop,et al.  Condensed Representations for Sets of Mining Queries , 2004, Database Support for Data Mining Applications.

[20]  Jan Rauch,et al.  Interesting Association Rules and Multi-relational Association Rules , 2002 .

[21]  Anthony J. Bonner,et al.  Sequence Datalog: Declarative String Manipulation in Databases , 1996, Logic in Databases.

[22]  Jan Rauch,et al.  Classes of Four-Fold Table Quantifiers , 1998, PKDD.

[23]  Dimitrios Gunopulos,et al.  Discovering All Most Specific Sentences by Randomized Algorithms , 1997, ICDT.

[24]  Ke Wang,et al.  Discovering Patterns from Large and Dynamic Sequential Data , 1997, Journal of Intelligent Information Systems.

[25]  Jan Rauch Some remarks on computer realizations of GUHA procedures , 1978 .

[26]  Luc De Raedt,et al.  A theory of inductive query answering , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[27]  Heikki Mannila,et al.  Discovering Frequent Episodes in Sequences , 1995, KDD.

[28]  Luc De Raedt,et al.  A perspective on inductive databases , 2002, SKDD.

[29]  Haym Hirsh,et al.  Generalizing Version Spaces , 1994, Machine Learning.

[30]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[31]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[32]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[33]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[34]  Saul Greenberg,et al.  USING UNIX: COLLECTED TRACES OF 168 USERS , 1988 .

[35]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[36]  Jan Rauch Logical Calculi for Knowledge Discovery in Databases , 1997, PKDD.

[37]  Heikki Mannila,et al.  A database perspective on knowledge discovery , 1996, CACM.

[38]  Luc De Raedt,et al.  Towards Discovering Structural Signatures of Protein Folds Based on Logical Hidden Markov Models , 2003, Pacific Symposium on Biocomputing.

[39]  Fosca Giannotti,et al.  Querying Inductive Databases via Logic-Based User-Defined Aggregates , 1999, PKDD.

[40]  Luc De Raedt,et al.  Iterative Versionspaces , 1994, Artif. Intell..

[41]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[42]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[43]  Hannu Toivonen,et al.  Discovery of frequent DATALOG patterns , 1999, Data Mining and Knowledge Discovery.

[44]  Giuseppe Psaila,et al.  An Extension to SQL for Mining Association Rules , 1998, Data Mining and Knowledge Discovery.