Multi-Relational Data Mining ( paper id : 294 )

An important aspect of data mining algorithms and systems is that they should scale well to large databases A consequence of this is that most data mining tools are based on machine learning algorithms that work on data in attribute-value format. Experience has proven that such ’single-table’ mining algorithms indeed scale well. The downside of this format is, however, that more complex patterns are simply not expressible in this format and, thus, cannot be discovered. One way to enlarge the expressiveness is to generalize, as in ILP, from one-table mining to multiple table mining, i.e., to support mining on full relational databases. The key step in such a generalization is to ensure that the search space does not explode and that efficiency and, thus, scalability are maintained. In this paper we present a framework and an architecture that provide such a generalization. In this framework the semantic information in the database schema, e.g., foreign keys, are exploited to prune the search space and, in the architecture, database primitives are defined to ensure efficiency. Moreover, the framework induces a canonical generalization of algorithms, i.e., if the generalized algorithms are run on a single table database, they give the same results as their single-table counterparts. The framework is illustrated by the Warmr algorithm, which is a multi-relational generalization of the Apriori algorithm.

[1]  George H. John,et al.  SIPping from the Data Firehose , 1997, KDD.

[2]  Saso Dzeroski,et al.  Learning Nonrecursive Definitions of Relations with LINUS , 1991, EWSL.

[3]  Saso Dzeroski,et al.  Inductive Logic Programming and Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[4]  A Parallel Data Mining Architecture for Massive Data Sets , 1999 .

[5]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[6]  Simon H. Lavington,et al.  Knowledge Discovery from Client-Server Databases , 1998, PKDD.

[7]  Hendrik Blockeel,et al.  Top-Down Induction of First Order Logical Decision Trees , 1998, AI Commun..

[8]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[9]  Hugo Liu,et al.  Searching Multiple Databases for Interesting Complexes , 1997 .

[10]  Larry Kerschberg,et al.  Knowledge Discovery from Multiple Databases , 1995, KDD.

[11]  Heikki Mannila,et al.  On an algorithm for finding all interesting sentences , 1996 .

[12]  Heikki Mannila,et al.  A Perspective on Databases and Data Mining , 1995, KDD.

[13]  Luc De Raedt,et al.  Mining Association Rules in Multiple Relations , 1997, ILP.

[14]  Luc De Raedt,et al.  Relational Knowledge Discovery in Databases , 1996, Inductive Logic Programming Workshop.

[15]  Alex A. Freitas,et al.  Using SQL primitives and parallel DB servers to speed up knowledge discovery in large relational databases , 1996 .

[16]  Christel Vrain,et al.  A Relational Data Mining Tool Based On Genetic Programming , 1998, PKDD.