A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns From Very Large Databases

Most of the existing methods to solve the problem of association rules mining (ARM) rely on special data structures to project the database (either totally or partially) in the primary memory. Traditionally, these data structures reside in the main memory and rely on the existing paging mechanism of the virtual memory manager (VMM) to handle the storage problem when they go out of the primary memory. Typically, VMM stores the overloaded data into the secondary memory based on some preassumed memory usage criteria. However, this direct and unplanned use of virtual memory results in an unpredictable behavior or thrashing, as depicted by some of the works described in the literature. This problem is tackled in this paper by presenting an ARM model capable of mining a transactional database, regardless of its size and without relying on the underlying VMM; the proposed approach could use only a bounded portion of the primary memory and this gives the opportunity to assign other parts of the main memory to other tasks with different priority. In other words, we propose a specialized memory management system which caters to the needs of the ARM model in such a way that the proposed data structure is constructed in the available allocated primary memory first. If at any point the structure grows out of the allocated memory quota, it is forced to be partially saved on secondary memory. The secondary memory version of the structure is accessed in a block-by-block basis so that both the spatial and temporal localities of the I/O access are optimized. Thus, the proposed framework takes control of the virtual memory access and hence manages the required virtual memory in an optimal way to the best benefit of the mining process to be served. Several clever data structures are used to facilitate these optimizations. Our method has the additional advantage that other tasks of different priorities may run concurrently with the main mining task with as little interference as possible because we do not rely on the default paging mechanism of the VMM. The reported test results demonstrate the applicability and effectiveness of the proposed approach.

[1]  Cevdet Aykanat,et al.  A Space Optimization for FP-Growth , 2004, FIMI.

[2]  Srinivasan Parthasarathy,et al.  Out-of-core frequent pattern mining on a commodity PC , 2006, KDD '06.

[3]  Carson Kai-Sang Leung,et al.  CanTree: a tree structure for efficient incremental mining of frequent patterns , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[4]  Chen Wang,et al.  Scalable mining of large disk-based graph databases , 2004, KDD.

[5]  Raj P. Gopalan,et al.  CT-PRO: A Bottom-Up Non Recursive Frequent Itemset Mining Algorithm Using Compressed FP-Tree Data Structure , 2004, FIMI.

[6]  Ke Wang,et al.  Mining frequent item sets by opportunistic projection , 2002, KDD.

[7]  Osmar R. Zaïane,et al.  COFI approach for mining frequent itemsets revisited , 2004, DMKD '04.

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[9]  Hongjun Lu,et al.  AFOPT: An Efficient Implementation of Pattern Growth Approach , 2003, FIMI.

[10]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[11]  Osmar R. Zaïane,et al.  Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[12]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[13]  Ramakrishnan Srikant,et al.  The Quest Data Mining System , 1996, KDD.

[14]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[15]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[16]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[17]  Amihood Amir,et al.  A New and Versatile Method for Association Generation , 1997, PKDD.

[18]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[19]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[20]  Reda Alhajj,et al.  DRFP-tree: disk-resident frequent pattern tree , 2009, Applied Intelligence.

[21]  Risto Vaarandi,et al.  A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs , 2004, INTELLCOMM.

[22]  Hongjun Lu,et al.  Constructing suffix tree for gigabyte sequences with megabyte memory , 2005, IEEE Transactions on Knowledge and Data Engineering.

[23]  Jun-Lin Lin,et al.  Mining association rules: anti-skew algorithms , 1998, Proceedings 14th International Conference on Data Engineering.

[24]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[25]  Bart Goethals,et al.  Memory issues in frequent itemset mining , 2004, SAC '04.

[26]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[27]  Vikram Pudi,et al.  ARMOR: Association Rule Mining based on ORacle , 2003, FIMI.

[28]  Limsoon Wong,et al.  CPS-tree: A Compact Partitioned Suffix Tree for Disk-based Indexing on Large Genome Sequences , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[29]  Balázs Rácz,et al.  nonordfp: An FP-growth variation without rebuilding the FP-tree , 2004, FIMI.

[30]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[31]  Maria E. Orlowska,et al.  Improvements in the Data Partitioning Approach for Frequent Itemsets Mining , 2005, PKDD.