Research on a Scalable Parallel Data Mining Algorithm

Sequential pattern mining is an active field in the domain of knowledge discovery and has been widely studied for over a decade by data mining researchers. More and more, with the constant progress in hardware and software technologies, real-world applications like network monitoring systems or sensor grids generate huge amount of streaming data. These works need an efficient and scalable parallel algorithm. On the basis of the widespread problem in current sequential pattern data mining algorithm and researching the data mining algorithm of serial sequential pattern, this paper proposes sequential patterns based and projection database based algorithm for scalable parallel sequential patterns data mining algorithm. Through theoretical analysis and experimental verification, the parallel data mining algorithm can well reduce the computational and spatial complexity and improve the efficiency of data mining in massive data circumstances.

[1]  Philip S. Yu,et al.  Catch the moment: maintaining closed frequent itemsets over a data stream sliding window , 2006, Knowledge and Information Systems.

[2]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[3]  Yue Chen,et al.  Incremental Mining of Sequential Patterns Using Prefix Tree , 2007, PAKDD.

[4]  Takashi Onoda,et al.  Neural network information criterion for the optimal number of hidden units , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[5]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[6]  Maguelonne Teisseire,et al.  Incremental mining of sequential patterns in large databases , 2003, Data Knowl. Eng..

[7]  Tian Zhu,et al.  A Parallel Mining Algorithm for Closed Sequential Patterns , 2007, 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07).

[8]  Valerie Guralnik,et al.  Parallel Tree Projection Algorithm for Sequence Mining , 2001, Euro-Par.

[9]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[10]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[11]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[12]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[13]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[14]  Guoqing Chen,et al.  Mining Positive and Negative Association Rules from Large Databases , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.

[15]  Anne Laurent,et al.  M2SP: Mining Sequential Patterns Among Several Dimensions , 2005, PKDD.

[16]  Johannes Gehrke,et al.  Sequential PAttern mining using a bitmap representation , 2002, KDD.

[17]  Giandomenico Spezzano,et al.  Improving induction decision trees with parallel genetic programming , 2002, Proceedings 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing.