Parallel and quantitative sequential pattern mining for large-scale interval-based temporal data

Mining frequent subsequences of patterns, or sequential pattern mining, has wide application in customer shopping sequence analysis, web log stream analysis, multi-modal behavioral studies, to name a few. To detect unknown, anomalous, and unexpected patterns from large-scale interval-based temporal data without complete a priori knowledge is challenging. In this paper, we present a framework - PESMiner which allows parallel and quantitative mining of sequential patterns at scale. Whereas most existing sequential mining algorithms can only find sequential orders of temporal events, our work presents a novel interactive temporal data mining algorithm capable of extracting precise temporal properties of sequential patterns. Furthermore, our work provides a unified parallel solution that scales our algorithms to larger temporal data sets by exploiting iterative MapReduce tasks. Comprehensive performance evaluations demonstrate that PESMiner significantly outperforms existing interval-based mining algorithms in terms of both quality (i.e. accuracy, precision, and recall) and scalability.

[1]  Mong-Li Lee,et al.  Mining relationships among interval-based events for classification , 2008, SIGMOD Conference.

[2]  Umeshwar Dayal,et al.  FreeSpan: frequent pattern-projected sequential pattern mining , 2000, KDD '00.

[3]  Thomas Guyet,et al.  Extracting Temporal Patterns from Interval-Based Sequences , 2011, IJCAI.

[4]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[5]  Simon Fraser MULTI-DIMENSIONAL SEQUENTIAL PATTERN MINING , 2001 .

[6]  Ada Wai-Chee Fu,et al.  Discovering Temporal Patterns for Interval-Based Events , 2000, DaWaK.

[7]  Suh-Yin Lee,et al.  CEMiner -- An Efficient Algorithm for Mining Closed Patterns from Time Interval-Based Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[8]  Tomonobu Ozaki,et al.  Discovery of Quantitative Sequential Patterns from Event Sequences , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[9]  Thomas Guyet,et al.  Mining Temporal Patterns with Quantitative Intervals , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[10]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[11]  Qiming Chen,et al.  PrefixSpan,: mining sequential patterns efficiently by prefix-projected pattern growth , 2001, Proceedings 17th International Conference on Data Engineering.

[12]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[13]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[14]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.