论文信息 - An empirical study on mining sequential patterns in a grid computing environment

An empirical study on mining sequential patterns in a grid computing environment

Mining sequential patterns (MSP) is an important task for knowledge discovery and data mining (KDD). Like in most KDD tasks, MSP also invokes a number of iterations for generating, adjusting, and comparing data. This paper presents an empirical study on deploying MSP in a grid computing environment and demonstrates the effectiveness and performance improvements gained in this deployment. GSP, which is a typical MSP method, is used as the mining algorithm to be investigated. A grid computing environment is designed and implemented, where all GSP functions are organized as loosely coupled web-services. MSP is achieved through the cooperation of these web-services using the divide-and-conquer strategy. Several monitoring mechanisms are developed to help manage the MSP process. The experimental results show that the proposed grid computing environment provides a flexible and efficient platform for MSP.

[1] Masaru Kitsuregawa,et al. Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach , 1998, PAKDD.

[2] Zhenglu Yang,et al. LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[3] Ian Foster,et al. The Globus toolkit , 1998 .

[4] Rakesh Agrawal,et al. Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[5] Yen-Liang Chen,et al. Mining sequential patterns from multidimensional sequence data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6] William E. Johnston,et al. Grids as production computing environments: the engineering aspects of NASA's Information Power Grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[7] Mario Cannataro,et al. KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery , 2002 .

[8] Mohammed J. Zaki. Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[9] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10] David A. Padua,et al. Parallel mining of closed sequential patterns , 2005, KDD '05.

[11] Vlado Stankovski,et al. Grid-enabling data mining applications with DataMiningGrid: An architectural perspective , 2008, Future Gener. Comput. Syst..

[12] Valerie Guralnik,et al. Parallel tree-projection-based sequence mining algorithms , 2004, Parallel Comput..

[13] Mitica Craus,et al. Grid implementation of the Apriori algorithm , 2007, Adv. Eng. Softw..

[14] Tzung-Pei Hong,et al. A load-balanced distributed parallel mining algorithm , 2010, Expert Syst. Appl..

[15] Philip S. Yu,et al. Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[16] Radu Sion,et al. A grid-based approach for enterprise-scale data mining , 2007, Future Gener. Comput. Syst..

[17] Domenico Talia,et al. Service-oriented middleware for distributed data mining on the grid , 2008, J. Parallel Distributed Comput..

[18] Jiawei Han,et al. TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[19] Johannes Gehrke,et al. Sequential PAttern mining using a bitmap representation , 2002, KDD.

[20] David Mosberger,et al. Cluster-C/sup */: understanding the performance limits , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[21] Mario Cannataro,et al. Distributed data mining on grids: services, tools, and applications , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22] Tzung-Pei Hong,et al. An improved data mining approach using predictive itemsets , 2009, Expert Syst. Appl..

[23] Ian T. Foster,et al. The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[24] Qing He,et al. Distributed data mining in grid computing environments , 2007, Future Gener. Comput. Syst..

[25] Janusz R. Getta,et al. A parameterised algorithm for mining association rules , 2001, Proceedings 12th Australasian Database Conference. ADC 2001.

[26] David B. Skillicorn,et al. Strategies for parallel data mining , 1999, IEEE Concurr..

[27] Jianyong Wang,et al. Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28] María S. Pérez-Hernández,et al. Design and implementation of a data mining grid-aware architecture , 2007, Future Gener. Comput. Syst..

[29] Yin-Fu Huang,et al. Mining sequential patterns using graph search techniques , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[30] Jiayi Zhou,et al. Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system , 2010, Expert Syst. Appl..

[31] Marian Bubak,et al. Perspectives on grid computing , 2010, Future Gener. Comput. Syst..

[32] Ruoming Jin,et al. Middleware for data mining applications on clusters and grids , 2008, J. Parallel Distributed Comput..

[33] Mario Cannataro,et al. The knowledge grid , 2003, CACM.

[34] Chris Smith,et al. An Open Grid Services Architecture Primer , 2009, Computer.