Frequent query discovery: a unifying ILP approach to association rule mining

Discovery of frequent patterns has been studied in a variety of data mining (DM) settings. In its simplest form, known from association rule mining, the task is to nd all frequent itemsets, i.e., to list all combinations of items that are found in a suucient number of examples. A similar task in spirit, but at the opposite end of the complexity scale, is the Inductive Logic Programming (ILP) approach where the goal is to discover queries in rst order logic that succeed with respect to a suucient number of examples. We discuss the relationship of ILP to frequent pattern discovery. On one hand, our goal is to relate data mining problems to ILP. On another hand, we want to demonstrate how ILP can be used to solve both existing and new data mining problems. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered. From an ILP viewpoint, however, it can be argued that these settings are all well-controlled subtasks of the full ILP counterpart of the problem. We try to restore the blurred picture by describing the existing approaches using a uniied database representation. With the representation, we relate also the DM settings to each other and propose some interesting new areas to be explored. We analyse some aspects of the gradual change in the trade-oo between expressivity and eeciency, as one moves from the frequent set problem towards ILP.

[1]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[2]  Luc De Raedt,et al.  Declarative Bias for Specific-to-General ILP Systems , 1994, Machine Learning.

[3]  Luc De Raedt,et al.  Induction in logic , 1996 .

[4]  Balaji Padmanabhan,et al.  Pattern Discovery in Temporal Databases: A Temporal Logic Approach , 1996, KDD.

[5]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[6]  Luc De Raedt,et al.  Inductive Constraint Logic , 1995, ALT.

[7]  Claire Nédellec,et al.  Declarative Bias in ILP , 1996 .

[8]  Heikki Mannila,et al.  Knowledge discovery from telecommunication network alarm databases , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[9]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  Dimitrios Gunopulos,et al.  Episode Matching , 1997, CPM.

[11]  Ramakrishnan Srikant,et al.  Mining generalized association rules , 1995, Future Gener. Comput. Syst..

[12]  Carlo Zaniolo,et al.  Metaqueries for Data Mining , 1996, Advances in Knowledge Discovery and Data Mining.

[13]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[14]  Dimitrios Gunopulos,et al.  Data mining, hypergraph transversals, and machine learning (extended abstract) , 1997, PODS.

[15]  De Raedt,et al.  Advances in Inductive Logic Programming , 1996 .

[16]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[17]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[18]  Luc De Raedt,et al.  Mining Association Rules in Multiple Relations , 1997, ILP.

[19]  Saso Dzeroski,et al.  Inductive Logic Programming and Knowledge Discovery in Databases , 1996, Advances in Knowledge Discovery and Data Mining.

[20]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[21]  Luc De Raedt,et al.  First-Order jk-Clausal Theories are PAC-Learnable , 1994, Artif. Intell..

[22]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[23]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  Paul R. Cohen,et al.  Searching for Structure in Multiple Streams of Data , 1996, ICML.

[26]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[27]  Saso Dzeroski,et al.  Inductive Logic Programming: Techniques and Applications , 1993 .

[28]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.

[29]  Francesco Bergadano,et al.  Inductive Logic Programming: From Machine Learning to Software Engineering , 1995 .

[30]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[31]  Jorg-uwe Kietz,et al.  Controlling the Complexity of Learning in Logic through Syntactic and Task-Oriented Models , 1992 .

[32]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[33]  Sholom M. Weiss,et al.  Data Mining and Forecasting in Large-Scale Telecommunication Networks , 1996, IEEE Expert.

[34]  Sushil Jajodia,et al.  Testing complex temporal relationships involving multiple granularities and its application to data mining (extended abstract) , 1996, PODS.

[35]  Malik Ghallab,et al.  Situation Recognition: Representation and Algorithms , 1993, IJCAI.

[36]  Heikki Mannila,et al.  A Perspective on Databases and Data Mining , 1995, KDD.

[37]  Luc De Raedt,et al.  Relational Knowledge Discovery in Databases , 1996, Inductive Logic Programming Workshop.

[38]  Luc De Raedt,et al.  Using Logical Decision Trees for Clustering , 1997, ILP.

[39]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[40]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[41]  Heikki Mannila,et al.  Discovering Generalized Episodes Using Minimal Occurrences , 1996, KDD.

[42]  Luc De Raedt,et al.  Three Companions for First Order Data Mining , 1998 .

[43]  Hongjun Lu,et al.  NeuroRule: A Connectionist Approach to Data Mining , 1995, VLDB.

[44]  Yasuhiko Morimoto,et al.  Computing Optimized Rectilinear Regions for Association Rules , 1997, KDD.

[45]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[46]  Luc De Raedt,et al.  DLAB: A Declarative Language Bias Formalism , 1996, ISMIS.

[47]  Kaizhong Zhang,et al.  Combinatorial pattern discovery for scientific data: some preliminary results , 1994, SIGMOD '94.

[48]  Luc De Raedt,et al.  Top-down induction of logical decision trees , 1997 .