A support-ordered trie for fast frequent itemset discovery

The importance of data mining is apparent with the advent of powerful data collection and storage tools; raw data is so abundant that manual analysis is no longer possible. Unfortunately, data mining problems are difficult to solve and this prompted the introduction of several novel data structures to improve mining efficiency. Here, we critically examine existing preprocessing data structures used in association rule mining for enhancing performance in an attempt to understand their strengths and weaknesses. Our analyses culminate in a practical structure called the SOTrielT (support-ordered trie itemset) and two synergistic algorithms to accompany it for the fast discovery of frequent itemsets. Experiments involving a wide range of synthetic data sets reveal that its algorithms outperform FP-growth, a recent association rule mining algorithm with excellent performance, by up to two orders of magnitude and, thus, verifying its' efficiency and viability.

[1]  John Riedl,et al.  Analysis of recommendation algorithms for e-commerce , 2000, EC '00.

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Christian Hidber,et al.  Association Rule Mining , 2017 .

[5]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[6]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[7]  Philip S. Yu,et al.  Online generation of association rules , 1998, Proceedings 14th International Conference on Data Engineering.

[8]  Amihood Amir,et al.  A New and Versatile Method for Association Generation , 1997, Inf. Syst..

[9]  Elena Marchiori,et al.  Mining Clusters with Association Rules , 1999, IDA.

[10]  Wee Keong Ng,et al.  Fast online dynamic association rule mining , 2001, Proceedings of the Second International Conference on Web Information Systems Engineering.

[11]  Wojciech Szpankowski,et al.  Summary structures for frequency queries on large transaction sets , 2000, Proceedings DCC 2000. Data Compression Conference.

[12]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[13]  Ee-Peng Lim,et al.  Online and incremental mining of separately-grouped Web access logs , 2002, Proceedings of the Third International Conference on Web Information Systems Engineering, 2002. WISE 2002..

[14]  Luigi Palopoli,et al.  On the Complexity of Mining Association Rules , 2001, SEBD.

[15]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[16]  K Satou,et al.  Finding association rules on heterogeneous genome data. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[17]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[18]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[19]  Frans Coenen,et al.  Computing Association Rules Using Partial Totals , 2001, PKDD.

[20]  Wee Keong Ng,et al.  Rapid association rule mining , 2001, CIKM '01.