Subgroup Discovery in Sequential Databases

Sequential pattern mining produces a vast number of frequent patterns due to the combinatorial nature of the problem and redundant information in the results. Several pattern mining techniques (e.g., closed-patterns, maximal-patterns, gap constraints, recency and compactness constraints) have been studied to reduce the size of the results. However, these approaches can still generate large result sets often containing patterns of little use or interestingness to the end users. Even when many interesting results are returned, finding the most useful can be difficult. By applying ideas from subgroup discovery to sequential pattern mining, we have developed exact and heuristic-based algorithms for identifying and ranking the top-k most significant patterns from the complete collection of frequent patterns.

[1]  D. Binu,et al.  An approach to products placement in supermarkets using PrefixSpan algorithm , 2013, J. King Saud Univ. Comput. Inf. Sci..

[2]  Nada Lavrac,et al.  Expert-Guided Subgroup Discovery: Methodology and Application , 2011, J. Artif. Intell. Res..

[3]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4]  Yen-Liang Chen,et al.  Constraint-based sequential pattern mining: The consideration of recency and compactness , 2006, Decis. Support Syst..

[5]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[6]  Willi Klösgen,et al.  Explora: A Multipattern and Multistrategy Discovery Assistant , 1996, Advances in Knowledge Discovery and Data Mining.

[7]  Kenji Araki,et al.  Sequential pattern mining on electronic medical records with handling time intervals and the efficacy of medicines , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).

[8]  Yun Sing Koh,et al.  A Survey of Sequential Pattern Mining , 2017 .

[9]  Ramakrishnan Srikant,et al.  Mining Sequential Patterns: Generalizations and Performance Improvements , 1996, EDBT.

[10]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[11]  Jiawei Han,et al.  TSP: Mining top-k closed sequential patterns , 2003, Third IEEE International Conference on Data Mining.

[12]  Cláudia Antunes,et al.  Generalization of Pattern-Growth Methods for Sequential Pattern Mining with Gap Constraints , 2003, MLDM.

[13]  Xifeng Yan,et al.  CloSpan: Mining Closed Sequential Patterns in Large Datasets , 2003, SDM.

[14]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[15]  James Bailey,et al.  Mining minimal distinguishing subsequence patterns with gap constraints , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[16]  Jianyong Wang,et al.  Efficiently Mining Closed Subsequences with Gap Constraints , 2008, SDM.

[17]  Baw-Jhiune Liu,et al.  Identification of hot regions in protein-protein interactions by sequential pattern mining , 2007, BMC Bioinformatics.