Predefined pattern detection in large time series

Predefined pattern detection from time series is an interesting and challenging task. In order to reduce its computational cost and increase effectiveness, a number of time series representation methods and similarity measures have been proposed. Most of the existing methods focus on full sequence matching, that is, sequences with clearly defined beginnings and endings, where all data points contribute to the match. These methods, however, do not account for temporal and magnitude deformations in the data and result to be ineffective on several real-world scenarios where noise and external phenomena introduce diversity in the class of patterns to be matched. In this paper, we present a novel pattern detection method, which is based on the notions of templates, landmarks, constraints and trust regions. We employ the Minimum Description Length (MDL) principle for time series preprocessing step, which helps to preserve all the prominent features and prevents the template from overfitting. Templates are provided by common users or domain experts, and represent interesting patterns we want to detect from time series. Instead of utilising templates to match all the potential subsequences in the time series, we translate the time series and templates into landmark sequences, and detect patterns from landmark sequence of the time series. Through defining constraints within the template landmark sequence, we effectively extract all the landmark subsequences from the time series landmark sequence, and obtain a number of landmark segments (time series subsequences or instances). We model each landmark segment through scaling the template in both temporal and magnitude dimensions. To suppress the influence of noise, we introduce the concept of trust region, which not only helps to achieve an improved instance model, but also helps to catch the accurate boundaries of instances of the given template. Based on the similarities derived from instance models, we introduce the probability density function to calculate a similarity threshold. The threshold can be used to judge if a landmark segment is a true instance of the given template or not. To evaluate the effectiveness and efficiency of the proposed method, we apply it to two real-world datasets. The results show that our method is capable of detecting patterns of temporal and magnitude deformations with competitive performance.

[1]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[2]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[3]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[4]  Qiang Wang,et al.  A dimensionality reduction technique for efficient time series similarity analysis , 2008, Inf. Syst..

[5]  Hwann-Tzong Chen,et al.  Real-time tracking using trust-region methods , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[7]  Eamonn J. Keogh,et al.  Atomic wedgie: efficient query filtering for streaming time series , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[8]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[9]  Xiaozhe Wang,et al.  Characteristic-Based Clustering for Time Series Data , 2006, Data Mining and Knowledge Discovery.

[10]  Gareth J. Janacek,et al.  A Bit Level Representation for Time Series Data Mining with Shape Based Similarity , 2006, Data Mining and Knowledge Discovery.

[11]  Shengfa Miao,et al.  Traffic Events Modeling for Structural Health Monitoring , 2011, IDA.

[12]  C. Ratanamahatana,et al.  Shape averaging under Time Warping , 2009, 2009 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.

[13]  Qiang Wang,et al.  A dimensionality reduction technique for efficient similarity analysis of time series databases , 2004, CIKM '04.

[14]  Dimitrios Gunopulos,et al.  Online amnesic approximation of streaming time series , 2004, Proceedings. 20th International Conference on Data Engineering.

[15]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[16]  Yang Liu,et al.  Infrared point target detection with improved template matching , 2012 .

[17]  Toyoaki Nishida,et al.  Constrained Motif Discovery in Time Series , 2009, New Generation Computing.

[18]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[19]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[20]  Yannis Manolopoulos,et al.  Feature-based classification of time-series data , 2001 .

[21]  Dimitrios Gunopulos,et al.  A MPAA-Based Iterative Clustering Algorithm Augmented by Nearest Neighbors Search for Time-Series Data Streams , 2005, PAKDD.

[22]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[23]  P. Grünwald The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .

[24]  Jing-Yu Yang,et al.  Face detection using template matching and skin-color information , 2007, Neurocomputing.

[25]  Meinard Müller,et al.  Motion templates for automatic classification and retrieval of motion capture data , 2006, SCA '06.

[26]  Eamonn J. Keogh,et al.  Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL , 2011, 2011 IEEE 11th International Conference on Data Mining.

[27]  Robert S. Caprari Duplicate document detection by template matching , 2000, Image Vis. Comput..

[28]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[29]  Vit Niennattrakul,et al.  Shape-based template matching for time series data , 2012, Knowl. Based Syst..

[30]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[31]  Shie Mannor,et al.  Time Series Analysis Using Geometric Template Matching , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[33]  Anthony K. H. Tung,et al.  SpADe: On Shape-based Pattern Detection in Streaming Time Series , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[34]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[35]  Siegfried Nijssen,et al.  Mining characteristic multi-scale motifs in sensor-based time series , 2013, CIKM.

[36]  Hans-Peter Kriegel,et al.  Similarity Search in Multimedia Time Series Data Using Amplitude-Level Features , 2008, MMM.

[37]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[38]  Siegfried Nijssen,et al.  MDL-Based Analysis of Time Series at Multiple Time-Scales , 2012, ECML/PKDD.

[39]  Valtino X. Afonso,et al.  ECG QRS detection , 1993 .

[40]  Shengfa Miao,et al.  Automatic baseline correction of strain gauge signals , 2015 .

[41]  Padhraic Smyth,et al.  Deformable Markov model templates for time-series pattern matching , 2000, KDD '00.

[42]  Francisco Sandoval Hernández,et al.  Fast gesture recognition based on a two-level representation , 2009, Pattern Recognit. Lett..

[43]  Eamonn J. Keogh,et al.  Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[44]  Qiang Wang,et al.  A symbolic representation of time series , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[45]  Thomas Vetter,et al.  Using Landmarks as a Deformation Prior for Hybrid Image Registration , 2011, DAGM-Symposium.

[46]  U. Grenander,et al.  Structural Image Restoration through Deformable Templates , 1991 .

[47]  Qiang Wang,et al.  A multiresolution symbolic representation of time series , 2005, 21st International Conference on Data Engineering (ICDE'05).

[48]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[49]  Eamonn J. Keogh,et al.  Making Time-Series Classification More Accurate Using Learned Constraints , 2004, SDM.

[50]  Chotirat Ann Ratanamahatana,et al.  Efficient Time Series Classification under Template Matching Using Time Warping Alignment , 2009, 2009 Fourth International Conference on Computer Sciences and Convergence Information Technology.

[51]  Sergio Greco,et al.  A time series representation model for accurate and fast similarity detection , 2009, Pattern Recognit..

[52]  Toon Calders,et al.  InfraWatch: Data Management of Large Systems for Monitoring Infrastructural Performance , 2010, IDA.

[53]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[54]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[55]  R. J. Alcock,et al.  Time-Series Similarity Queries Employing a Feature-Based Approach , 1999 .

[56]  Kyuseok Shim,et al.  Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases , 1995, VLDB.