Efficient algorithms for frequent pattern mining in many-task computing environments

The goal of data mining is to discover hidden useful information in large databases. Mining frequent patterns from transaction databases is an important problem in data mining. As the database size increases, the computation time and required memory also increase. Because the number of items increases, the user behaviours also become more complex. To solve the problem of increasing complexity, many researchers have applied parallel and distributed computing techniques to the discovery of frequent patterns from large amounts of data. However, most studies have focused on improving the performance for a single task and have neglected the many-task computing issue, which is important in the current cloud-computing environments. In these environments, an application is often provided as a service, e.g., the Google search engine, implying that many users can use it simultaneously. In this paper, we propose a set of algorithms, containing the Equal Working Set (EWS) algorithm, the Request On Demand (ROD) algorithm, the Small Size Working Set (SSWS) algorithm and the Progressive Size Working Set (PSWS) algorithm, for frequent pattern mining that provides a fast and scalable mining service in many-task computing environments. Through empirical evaluations in various simulation conditions, the proposed algorithms are shown to deliver excellent performance with respect to scalability and execution time.

[1]  Yong Zhao,et al.  Many-task computing for grids and supercomputers , 2008, 2008 Workshop on Many-Task Computing on Grids and Supercomputers.

[2]  Domenico Talia,et al.  Service-oriented middleware for distributed data mining on the grid , 2008, J. Parallel Distributed Comput..

[3]  Zhen Liu,et al.  MapReduce as a programming model for association rules algorithm on Hadoop , 2010, The 3rd International Conference on Information Sciences and Interaction Sciences.

[4]  Kawuu W. Lin,et al.  A novel parallel algorithm for frequent pattern mining with privacy preserved in cloud computing environments , 2010, Int. J. Ad Hoc Ubiquitous Comput..

[5]  Jiayi Zhou,et al.  Balanced Tidset-based Parallel FP-tree Algorithm for the Frequent Pattern Mining on Grid System , 2008, 2008 Fourth International Conference on Semantics, Knowledge and Grid.

[6]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[7]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[8]  Jiayi Zhou,et al.  Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system , 2010, Expert Syst. Appl..

[9]  Ashfaq Khokhar,et al.  Frequent Pattern Mining on Message Passing Multiprocessor Systems , 2004, Distributed and Parallel Databases.

[10]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Soon Myoung Chung,et al.  Parallel mining of association rules from text databases on a cluster of workstations , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[12]  Min Zhang,et al.  The Strategy of Mining Association Rule Based on Cloud Computing , 2011, 2011 International Conference on Business Computing and Global Informatization.

[13]  Chih-Hung Wu,et al.  An empirical study on mining sequential patterns in a grid computing environment , 2012, Expert Syst. Appl..

[14]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[15]  Yong Qiu,et al.  An improved algorithm of mining from FP-tree , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[16]  Vipin Kumar,et al.  Scalable parallel data mining for association rules , 1997, SIGMOD '97.

[17]  Ruoming Jin,et al.  Middleware for data mining applications on clusters and grids , 2008, J. Parallel Distributed Comput..

[18]  Kavita Sharma,et al.  Web mining: Today and tomorrow , 2011, 2011 3rd International Conference on Electronics Computer Technology.

[19]  Mitica Craus,et al.  Grid implementation of the Apriori algorithm , 2007, Adv. Eng. Softw..

[20]  Jiayi Zhou,et al.  Tidset-Based Parallel FP-tree Algorithm for the Frequent Pattern Mining Problem on PC Clusters , 2008, GPC.

[21]  María S. Pérez-Hernández,et al.  Design and implementation of a data mining grid-aware architecture , 2007, Future Gener. Comput. Syst..

[22]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[23]  Ivan Janciak,et al.  Cloud-Enabled Scalable Decision Tree Construction , 2009, 2009 Fifth International Conference on Semantics, Knowledge and Grid.