Fast Mining Frequent Patterns with Secondary Memory

Data mining technology has been widely studied and applied in recent years. Frequent pattern mining is one important technical field of such research. The frequent pattern mining technique is popular not only in academia but also in the business community. With advances in technology, databases have become so large that data mining is impossible because of memory restrictions. In this study, we propose a novel algorithm called Hybrid Mine (H-Mine) to help improve this situation. H-Mine saves a part of the information that is not stored in the memory, and through the use of mixed hard disk and memory mining we are able to complete data mining with limited memory. The results of empirical evaluation under various simulation conditions show that H-Mine delivers excellent performance in terms of execution efficiency and scalability.

[1]  Fabrizio Silvestri,et al.  Identifying task-based sessions in search engine query logs , 2011, WSDM '11.

[2]  Yong Qiu,et al.  An improved algorithm of mining from FP-tree , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[3]  Ashfaq Khokhar,et al.  Frequent Pattern Mining on Message Passing Multiprocessor Systems , 2004, Distributed and Parallel Databases.

[4]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Kawuu W. Lin,et al.  Efficient algorithms for frequent pattern mining in many-task computing environments , 2013, Knowl. Based Syst..

[7]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[8]  Reda Alhajj,et al.  DRFP-tree: disk-resident frequent pattern tree , 2009, Applied Intelligence.

[9]  Alfredo Cuzzocrea,et al.  Stream mining of frequent sets with limited memory , 2013, SAC '13.

[10]  Kawuu W. Lin,et al.  A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments , 2010, Int. J. Ad Hoc Ubiquitous Comput..

[11]  Wolfgang Lehner,et al.  Memory-efficient frequent-itemset mining , 2011, EDBT/ICDT '11.

[12]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[13]  Jiayi Zhou,et al.  Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters , 2008, GPC.