An empirical study on mining sequential patterns in a grid computing environment

Mining sequential patterns (MSP) is an important task for knowledge discovery and data mining (KDD). Like in most KDD tasks, MSP also invokes a number of iterations for generating, adjusting, and comparing data. This paper presents an empirical study on deploying MSP in a grid computing environment and demonstrates the effectiveness and performance improvements gained in this deployment. GSP, which is a typical MSP method, is used as the mining algorithm to be investigated. A grid computing environment is designed and implemented, where all GSP functions are organized as loosely coupled web-services. MSP is achieved through the cooperation of these web-services using the divide-and-conquer strategy. Several monitoring mechanisms are developed to help manage the MSP process. The experimental results show that the proposed grid computing environment provides a flexible and efficient platform for MSP.

[1]  Masaru Kitsuregawa,et al.  Mining Algorithms for Sequential Patterns in Parallel: Hash Based Approach , 1998, PAKDD.

[2]  Zhenglu Yang,et al.  LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[3]  Ian Foster,et al.  The Globus toolkit , 1998 .

[4]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[5]  Yen-Liang Chen,et al.  Mining sequential patterns from multidimensional sequence data , 2005, IEEE Transactions on Knowledge and Data Engineering.

[6]  William E. Johnston,et al.  Grids as production computing environments: the engineering aspects of NASA's Information Power Grid , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).

[7]  Mario Cannataro,et al.  KNOWLEDGE GRID An Architecture for Distributed Knowledge Discovery , 2002 .

[8]  Mohammed J. Zaki Parallel and distributed association mining: a survey , 1999, IEEE Concurr..

[9]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[10]  David A. Padua,et al.  Parallel mining of closed sequential patterns , 2005, KDD '05.

[11]  Vlado Stankovski,et al.  Grid-enabling data mining applications with DataMiningGrid: An architectural perspective , 2008, Future Gener. Comput. Syst..

[12]  Valerie Guralnik,et al.  Parallel tree-projection-based sequence mining algorithms , 2004, Parallel Comput..

[13]  Mitica Craus,et al.  Grid implementation of the Apriori algorithm , 2007, Adv. Eng. Softw..

[14]  Tzung-Pei Hong,et al.  A load-balanced distributed parallel mining algorithm , 2010, Expert Syst. Appl..

[15]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[16]  Radu Sion,et al.  A grid-based approach for enterprise-scale data mining , 2007, Future Gener. Comput. Syst..

[17]  Domenico Talia,et al.  Service-oriented middleware for distributed data mining on the grid , 2008, J. Parallel Distributed Comput..

[18]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2004, Knowledge and Information Systems.

[19]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[20]  David Mosberger,et al.  Cluster-C/sup */: understanding the performance limits , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[21]  Mario Cannataro,et al.  Distributed data mining on grids: services, tools, and applications , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[22]  Tzung-Pei Hong,et al.  An improved data mining approach using predictive itemsets , 2009, Expert Syst. Appl..

[23]  Ian T. Foster,et al.  The data grid: Towards an architecture for the distributed management and analysis of large scientific datasets , 2000, J. Netw. Comput. Appl..

[24]  Qing He,et al.  Distributed data mining in grid computing environments , 2007, Future Gener. Comput. Syst..

[25]  Janusz R. Getta,et al.  A parameterised algorithm for mining association rules , 2001, Proceedings 12th Australasian Database Conference. ADC 2001.

[26]  David B. Skillicorn,et al.  Strategies for parallel data mining , 1999, IEEE Concurr..

[27]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[28]  María S. Pérez-Hernández,et al.  Design and implementation of a data mining grid-aware architecture , 2007, Future Gener. Comput. Syst..

[29]  Yin-Fu Huang,et al.  Mining sequential patterns using graph search techniques , 2003, Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003.

[30]  Jiayi Zhou,et al.  Parallel TID-based frequent pattern mining algorithm on a PC Cluster and grid computing system , 2010, Expert Syst. Appl..

[31]  Marian Bubak,et al.  Perspectives on grid computing , 2010, Future Gener. Comput. Syst..

[32]  Ruoming Jin,et al.  Middleware for data mining applications on clusters and grids , 2008, J. Parallel Distributed Comput..

[33]  Mario Cannataro,et al.  The knowledge grid , 2003, CACM.

[34]  Chris Smith,et al.  An Open Grid Services Architecture Primer , 2009, Computer.