Discovering Relational Items Sets Efficiently

Frequent item set mining is a major data mining research area. Generalising from the standard single table case to a multirelational setting is simple in principle, but hard in practice. That is, it is simple to define frequent item sets in the multirelational setting, as well as extending the A-Priori algorithm. It is hard, because the well-known frequent pattern explosion at low min-sup settings is far worse than it is in the standard case. In this paper we introduce an effective algorithm for the discovery of frequent, multi-relational item sets. These relational patterns show which item sets occur together. Answering questions like: ‘What type of Books are bought together with what Record types?’. Hence, they provide a symmetric insight in the relation and reveal patterns that are relevant with respect to the relation. It extends our earlier work on using MDL to discover a small set of characteristic item sets. The algorithm, R-KRIMP, first discovers the small set of characteristic patterns in the single tables and then combines these to find a small set of characteristic multi-relational item sets. This reduces the original search space dramatically and, hence, brings down the computational complexity by orders of magnitude. In the experiments we show that this approach yields a very good approximation of the naive approach, joining all tables into one huge table, while being far more efficient.

[1]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[2]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3]  Jilles Vreeken,et al.  Characterising the difference , 2007, KDD '07.

[4]  Hendrik Blockeel,et al.  Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.

[5]  Philip S. Yu,et al.  CrossMine: Efficient Classification Across Multiple Database Relations , 2004, Constraint-Based Mining and Inductive Databases.

[6]  Jilles Vreeken,et al.  Compression Picks Item Sets That Matter , 2006, PKDD.

[7]  C. S. Wallace,et al.  Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[8]  Luc Dehaspe,et al.  Discovery of relational association rules , 2001 .

[9]  Raghu Ramakrishnan,et al.  Database Management Systems , 1976 .

[10]  Jilles Vreeken,et al.  Item Sets that Compress , 2006, SDM.

[11]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[12]  Philip S. Yu,et al.  CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.