A Node Linkage Approach for Sequential Pattern Mining

Sequential Pattern Mining is a widely addressed problem in data mining, with applications such as analyzing Web usage, examining purchase behavior, and text mining, among others. Nevertheless, with the dramatic increase in data volume, the current approaches prove inefficient when dealing with large input datasets, a large number of different symbols and low minimum supports. In this paper, we propose a new sequential pattern mining algorithm, which follows a pattern-growth scheme to discover sequential patterns. Unlike most pattern growth algorithms, our approach does not build a data structure to represent the input dataset, but instead accesses the required sequences through pseudo-projection databases, achieving better runtime and reducing memory requirements. Our algorithm traverses the search space in a depth-first fashion and only preserves in memory a pattern node linkage and the pseudo-projections required for the branch being explored at the time. Experimental results show that our new approach, the Node Linkage Depth-First Traversal algorithm (NLDFT), has better performance and scalability in comparison with state of the art algorithms.

[1]  Peiyi Tang,et al.  Mining web access patterns with first-occurrence linked WAP-trees , 2007, SEDE.

[2]  Yongmoo Suh,et al.  CRM strategies for a small-sized online shopping mall based on association rules and sequential patterns , 2012, Expert Syst. Appl..

[3]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[4]  Mohammed J. Zaki,et al.  Prism: An effective approach for frequent sequence mining via prime-block encoding , 2010, J. Comput. Syst. Sci..

[5]  M. Teisseire,et al.  Efficient mining of sequential patterns with time constraints: Reducing the combinations , 2009, Expert Syst. Appl..

[6]  Nizar R. Mabroukeh,et al.  A taxonomy of sequential pattern mining algorithms , 2010, CSUR.

[7]  Dan Zhang,et al.  Mining very long sequences in large databases with PLWAPLong , 2009, IDEAS '09.

[8]  Mario Cannataro,et al.  Protein-to-protein interactions: Technologies, databases, and algorithms , 2010, CSUR.

[9]  Peiyi Tang,et al.  Mining frequent sequential patterns with first-occurrence forests , 2008, ACM-SE 46.

[10]  Jinlin Chen,et al.  An UpDown Directed Acyclic Graph Approach for Sequential Pattern Mining , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Mohammed J. Zaki,et al.  SPADE: An Efficient Algorithm for Mining Frequent Sequences , 2004, Machine Learning.

[12]  Paolo Rosso,et al.  Automatic Drug-Drug Interaction Detection: A Machine Learning Approach With Maximal Frequent Sequence Extraction , 2011 .

[13]  Myra Spiliopoulou,et al.  WUM: A tool for Web Utilization analysis , 1999 .

[14]  Jianyong Wang,et al.  Efficiently Mining Closed Subsequences with Gap Constraints , 2008, SDM.

[15]  Cláudia Antunes,et al.  Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints , 2003, MLDM.

[16]  Vassilis S. Kodogiannis,et al.  Mining anomalous events against frequent sequences in surveillance videos from commercial environments , 2012, Expert Syst. Appl..

[17]  Nicolás García-Pedrajas,et al.  Scaling up data mining algorithms: review and taxonomy , 2012, Progress in Artificial Intelligence.

[18]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[19]  Tharam S. Dillon,et al.  SEQUEST: Mining frequent subsequences using DMA strips , 2006 .

[20]  Manuel Montes-y-Gómez,et al.  A Text Mining Approach for Definition Question Answering , 2006, FinTAL.

[21]  Massimo Cafaro,et al.  Finding frequent items in parallel , 2011, Concurr. Comput. Pract. Exp..

[22]  Chih-Jung Chen,et al.  Generating touring path suggestions using time-interval sequential pattern mining , 2012, Expert Syst. Appl..

[23]  Helena Ahonen-Myka Discovery of Frequent Word Sequences in Text , 2002, Pattern Detection and Discovery.

[24]  Zhenglu Yang,et al.  LAPIN-SPAM: An Improved Algorithm for Mining Sequential Pattern , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[25]  José Francisco Martínez Trinidad,et al.  Document Clustering Based on Maximal Frequent Sequences , 2006, FinTAL.

[26]  Jian Pei,et al.  Mining Access Patterns Efficiently from Web Logs , 2000, PAKDD.

[27]  Peiyi Tang,et al.  Mining frequent web access patterns with partial enumeration , 2007, ACM-SE 45.

[28]  Paolo Rosso,et al.  Authorship Attribution Using Word Sequences , 2006, CIARP.

[29]  Yi Lu,et al.  Mining Web Log Sequential Patterns with Position Coded Pre-Order Linked WAP-Tree , 2005, Data Mining and Knowledge Discovery.