Maintaining the discovered sequential patterns for sequence insertion in dynamic databases

Mining useful information or knowledge from large databases has become a critical issue in recent years. Sequential patterns can be applied in many domains to analyze the customer or user behaviors, such as basket analysis, biological data or web click streams. Conventional approaches may re-mine the updated database in batch mode while sequences are changed. The fast updated sequential pattern (FUSP)-tree was proposed to update the discovered sequential patterns whether for sequence insertion or deletion. The original database is required to be rescanned if it is necessary to maintain the small sequence that was not kept in the FUSP tree. Pre-large concepts were proposed to maintain the dynamic data mining that outperforms the FUP concepts. In this paper, we adopted the pre-large concepts to the FUSP-tree structure for sequence insertion. A FUSP tree is built in advance to keep the large 1-sequences for later maintenance. The pre-large sequences are also kept to reduce the movement from large to small and vice versa. When the number of inserted sequences is smaller than the safety bound of the pre-large concepts, better results can be obtained by the proposed incremental algorithm for sequence insertion in dynamic databases.

[1]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[2]  Jianyong Wang,et al.  Mining sequential patterns by pattern-growth: the PrefixSpan approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[4]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[7]  Guanling Lee,et al.  PTree: Mining Sequential Patterns Efficiently in Multiple Data Streams Environment , 2013, J. Inf. Sci. Eng..

[8]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[9]  James M. Tien,et al.  Prediction of Uterine Contractions Using Knowledge-Assisted Sequential Pattern Analysis , 2013, IEEE Transactions on Biomedical Engineering.

[10]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[11]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[12]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[13]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[14]  Jianhong Wu,et al.  Data clustering - theory, algorithms, and applications , 2007 .

[15]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[16]  Engelbert Mephu Nguifo,et al.  CMRules: Mining sequential rules common to several sequences , 2012, Knowl. Based Syst..

[17]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[18]  Philip S. Yu,et al.  Efficient Data Mining for Path Traversal Patterns , 1998, IEEE Trans. Knowl. Data Eng..

[19]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[20]  Hong Chen,et al.  Evolving Sequential Patterns Mining Model over Click Stream with Levenshtein-Automata , 2008, 2008 3rd International Conference on Innovative Computing Information and Control.

[21]  John F. Roddick,et al.  Sequential pattern mining -- approaches and algorithms , 2013, CSUR.

[22]  Matthias Jarke,et al.  Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology , 1994 .

[23]  Tzung-Pei Hong,et al.  A new incremental data mining algorithm using pre-large itemsets , 2001, Intell. Data Anal..

[24]  Jeffrey Xu Yu,et al.  Scalable sequential pattern mining for biological sequences , 2004, CIKM '04.

[25]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments: Methods and Applications , 2012 .

[26]  Tzung-Pei Hong,et al.  An incremental mining algorithm for maintaining sequential patterns using pre-large sequences , 2011, Expert Syst. Appl..

[27]  Edwin Lughofer,et al.  Learning in Non-Stationary Environments , 2012 .

[28]  Suh-Yin Lee,et al.  Incremental update on sequential patterns in large databases , 1998, Proceedings Tenth IEEE International Conference on Tools with Artificial Intelligence (Cat. No.98CH36294).

[29]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[30]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[31]  Tzung-Pei Hong,et al.  An Incremental FUSP-Tree Maintenance Algorithm , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[32]  B. Nath,et al.  Incremental association rule mining: a survey , 2013, WIREs Data Mining Knowl. Discov..

[33]  Ming-Syan Chen,et al.  DFSP: a Depth-First SPelling algorithm for sequential pattern mining of biological sequences , 2012, Knowledge and Information Systems.

[34]  Tzung-Pei Hong,et al.  Maintenance of sequential patterns for record deletion , 2001, Proceedings 2001 IEEE International Conference on Data Mining.