Frequent item set mining is a major data mining research area. Generalising from the standard single table case to a multirelational setting is simple in principle, but hard in practice. That is, it is simple to define frequent item sets in the multirelational setting, as well as extending the A-Priori algorithm. It is hard, because the well-known frequent pattern explosion at low min-sup settings is far worse than it is in the standard case. In this paper we introduce an effective algorithm for the discovery of frequent, multi-relational item sets. These relational patterns show which item sets occur together. Answering questions like: ‘What type of Books are bought together with what Record types?’. Hence, they provide a symmetric insight in the relation and reveal patterns that are relevant with respect to the relation. It extends our earlier work on using MDL to discover a small set of characteristic item sets. The algorithm, R-KRIMP, first discovers the small set of characteristic patterns in the single tables and then combines these to find a small set of characteristic multi-relational item sets. This reduces the original search space dramatically and, hence, brings down the computational complexity by orders of magnitude. In the experiments we show that this approach yields a very good approximation of the naive approach, joining all tables into one huge table, while being far more efficient.
[1]
Ming Li,et al.
An Introduction to Kolmogorov Complexity and Its Applications
,
2019,
Texts in Computer Science.
[2]
Tomasz Imielinski,et al.
Mining association rules between sets of items in large databases
,
1993,
SIGMOD Conference.
[3]
Jilles Vreeken,et al.
Characterising the difference
,
2007,
KDD '07.
[4]
Hendrik Blockeel,et al.
Multi-Relational Data Mining
,
2005,
Frontiers in Artificial Intelligence and Applications.
[5]
Philip S. Yu,et al.
CrossMine: Efficient Classification Across Multiple Database Relations
,
2004,
Constraint-Based Mining and Inductive Databases.
[6]
Jilles Vreeken,et al.
Compression Picks Item Sets That Matter
,
2006,
PKDD.
[7]
C. S. Wallace,et al.
Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics)
,
2005
.
[8]
Luc Dehaspe,et al.
Discovery of relational association rules
,
2001
.
[9]
Raghu Ramakrishnan,et al.
Database Management Systems
,
1976
.
[10]
Jilles Vreeken,et al.
Item Sets that Compress
,
2006,
SDM.
[11]
Jian Pei,et al.
Mining frequent patterns without candidate generation
,
2000,
SIGMOD 2000.
[12]
Philip S. Yu,et al.
CrossMine: efficient classification across multiple database relations
,
2004,
Proceedings. 20th International Conference on Data Engineering.