DRFP-tree: disk-resident frequent pattern tree

Abstract Frequent itemset mining methods basically address time scalability and greatly rely on available physical memory. However, the size of real-world databases to be mined is exponentially increasing, and hence main memory size is a serious bottleneck of the existing methods. So, it is necessary to develop new methods that do not fully rely on physical memory; new methods that utilize the secondary storage in the mining process should be the target. This motivates the work described in this paper; we mainly propose (Disk Resident Frequent Pattern) DRFP-Growth as a disk based approach similar to FP-Growth. DRFP-growth uses DRFP-tree, which is treated exactly as FP-tree when constructed in main memory and gets into a modified structure when it turns into disk resident to overcome the main memory bottleneck. This way, we are able to mine for frequent itemsets from databases of arbitrary sizes without being restricted by the available physical memory. In other words, we initially try to mine the database using the original FP-growth; we expand into the secondary memory only if we run out of physical memory. So, DRFP-growth is very comparable to FP-growth for small databases and high support threshold values. On the other hand, using DRFP-growth, we are still able to mine huge databases for low support threshold values (the only limitation is the available secondary storage rather than physical memory). The reported test results demonstrate how the proposed approach succeeds for cases where main memory based approaches fail.

[1]  Bart Goethals,et al.  Memory issues in frequent itemset mining , 2004, SAC '04.

[2]  Reda Alhajj,et al.  Alternative Method for Increnentally Constructing the FP-Tree , 2006 .

[3]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[4]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[5]  Hongjun Lu,et al.  Constructing suffix tree for gigabyte sequences with megabyte memory , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[7]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[8]  Osmar R. Zaïane,et al.  COFI approach for mining frequent itemsets revisited , 2004, DMKD '04.

[9]  Johannes Gehrke,et al.  DEMON: mining and monitoring evolving data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[10]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[11]  Carson Kai-Sang Leung,et al.  CanTree: a tree structure for efficient incremental mining of frequent patterns , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[12]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[13]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[14]  Amihood Amir,et al.  A New and Versatile Method for Association Generation , 1997, PKDD.

[15]  Reda Alhajj,et al.  Constructing Complete FP-Tree for Incremental Mining of Frequent Patterns in Dynamic Databases , 2006, IEA/AIE.