论文信息 - Discovering Relational Items Sets Efficiently

Discovering Relational Items Sets Efficiently

Frequent item set mining is a major data mining research area. Generalising from the standard single table case to a multirelational setting is simple in principle, but hard in practice. That is, it is simple to define frequent item sets in the multirelational setting, as well as extending the A-Priori algorithm. It is hard, because the well-known frequent pattern explosion at low min-sup settings is far worse than it is in the standard case. In this paper we introduce an effective algorithm for the discovery of frequent, multi-relational item sets. These relational patterns show which item sets occur together. Answering questions like: ‘What type of Books are bought together with what Record types?’. Hence, they provide a symmetric insight in the relation and reveal patterns that are relevant with respect to the relation. It extends our earlier work on using MDL to discover a small set of characteristic item sets. The algorithm, R-KRIMP, first discovers the small set of characteristic patterns in the single tables and then combines these to find a small set of characteristic multi-relational item sets. This reduces the original search space dramatically and, hence, brings down the computational complexity by orders of magnitude. In the experiments we show that this approach yields a very good approximation of the naive approach, joining all tables into one huge table, while being far more efficient.

Arne Koopman | Arno Siebes

[1] Ming Li,et al. An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[2] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[3] Jilles Vreeken,et al. Characterising the difference , 2007, KDD '07.

[4] Hendrik Blockeel,et al. Multi-Relational Data Mining , 2005, Frontiers in Artificial Intelligence and Applications.

[5] Philip S. Yu,et al. CrossMine: Efficient Classification Across Multiple Database Relations , 2004, Constraint-Based Mining and Inductive Databases.

[6] Jilles Vreeken,et al. Compression Picks Item Sets That Matter , 2006, PKDD.

[7] C. S. Wallace,et al. Statistical and Inductive Inference by Minimum Message Length (Information Science and Statistics) , 2005 .

[8] Luc Dehaspe,et al. Discovery of relational association rules , 2001 .

[9] Raghu Ramakrishnan,et al. Database Management Systems , 1976 .

[10] Jilles Vreeken,et al. Item Sets that Compress , 2006, SDM.

[11] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[12] Philip S. Yu,et al. CrossMine: efficient classification across multiple database relations , 2004, Proceedings. 20th International Conference on Data Engineering.