Compact transaction database for efficient frequent pattern mining

Mining frequent patterns is one of the fundamental and essential operations in many data mining applications, such as discovering association rules. In this paper, we propose an innovative approach to generating compact transaction databases for efficient frequent pattern mining. It uses a compact tree structure, called CT-tree, to compress the original transactional data. This allows the CT-a priori algorithm, which is revised from the classical a priori algorithm, to generate frequent patterns quickly by skipping the initial database scan and reducing a great amount of I/O time per database scan. Empirical evaluations show that our approach is effective, efficient and promising, while the storage space requirement as well as the mining time can be decreased dramatically on both synthetic and real-world databases.

[1]  Ke Wang,et al.  Mining frequent item sets by opportunistic projection , 2002, KDD.

[2]  Ian H. Witten,et al.  Modeling for text compression , 1989, CSUR.

[3]  James A. Storer,et al.  Data Compression: Methods and Theory , 1987 .

[4]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[7]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[8]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[9]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[10]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[11]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.

[12]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[13]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[14]  Xiangji Huang,et al.  Discovery of interesting association rules from Livelink web log data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[15]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Daniel S. Hirschberg,et al.  Data compression , 1987, CSUR.

[18]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..