SeqDTW: A Segmentation Based Distance Measure for Time Series Data

Computing the similarity between time series data is one of the core challenges frequently occurring in time series analysis across various disciplines. Particularly, in building energy management, there are many applications that need to cluster the time series with simultaneous rise, peak, similar patterns, pattern-average allowing small amplitude shift and small phase shift as one group. Commonly used distance measures like Euclidean and DTW being sensitive to phase shift or amplitude shift, are unable to identify similarity between two time series with simultaneous rise, peak and similar patterns. To address this problem, we propose a novel time series similarity measure, sub-sequence based DTW, SeqDTW based on time series segmentation approach. SeqDTW attempts to find best match of each subsequence in the segmented time series. Overall distance is the weighted aggregate of distance between the matching subsequences. In addition to existing monotonicity in DTW warping path, SeqDTW reduces the number of computations across the distance matrix and avoids singularity. An exhaustive experiment shows the superiority of the proposed method in finding time series having simultaneous rise, peak and similar patterns.

[1]  Bernhard Sick,et al.  Online Segmentation of Time Series Based on Polynomial Least-Squares Approximations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Laurent Itti,et al.  shapeDTW: Shape Dynamic Time Warping , 2016, Pattern Recognit..

[3]  Xiaoli Li,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. 1 Classification of Energy Consumption in Buildings with Outlier Detection , 2022 .

[4]  Eamonn J. Keogh,et al.  Speeding up similarity search under dynamic time warping by pruning unpromising alignments , 2018, Data Mining and Knowledge Discovery.

[5]  Manabu Ichino,et al.  Hierarchical conceptual clustering based on quantile method for identifying microscopic details in distributional data , 2020 .

[6]  Daniel W. Stashuk,et al.  Affine and Regional Dynamic Time Warping , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[7]  Eamonn J. Keogh,et al.  Matrix Profile I: All Pairs Similarity Joins for Time Series: A Unifying View That Includes Motifs, Discords and Shapelets , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[8]  Eamonn J. Keogh,et al.  Clustering Time Series Using Unsupervised-Shapelets , 2012, 2012 IEEE 12th International Conference on Data Mining.

[9]  Anne M. Denton Kernel-density-based clustering of time series subsequences using a continuous random-walk noise model , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[10]  Chinatsu Aone,et al.  Fast and effective text mining using linear-time document clustering , 1999, KDD '99.

[11]  Lin-Shan Lee,et al.  Unsupervised spoken-term detection with spoken queries using segment-based dynamic time warping , 2010, INTERSPEECH.

[12]  Pedro Antonio Gutiérrez,et al.  Time-Series Clustering Based on the Characterization of Segment Typologies , 2018, IEEE Transactions on Cybernetics.

[13]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[14]  Cheng Wang,et al.  Similarity Measure Based on Incremental Warping Window for Time Series Data Mining , 2019, IEEE Access.

[15]  Zheng Zhang,et al.  Dynamic Time Warping under limited warping path length , 2017, Inf. Sci..

[16]  KeoghEamonn,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001 .

[17]  Soukaina Filali Boubrahimi,et al.  Segmentation of Time Series in Improving Dynamic Time Warping , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[18]  Amir Mosavi,et al.  A Hybrid clustering and classification technique for forecasting short‐term energy consumption , 2018, Environmental Progress & Sustainable Energy.

[19]  Abdullah Mueen,et al.  Enumeration of time series motifs of all lengths , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Abubakar Abid,et al.  Autowarp: Learning a Warping Distance from Unlabeled Time Series Using Sequence Autoencoders , 2018, NeurIPS.

[21]  Ahmed Moussa,et al.  Is-ClusterMPP: clustering algorithm through point processes and influence space towards high-dimensional data , 2019, Advances in Data Analysis and Classification.

[22]  José Antonio Lozano,et al.  A review on distance based time series classification , 2018, Data Mining and Knowledge Discovery.

[23]  Steve B. Jiang,et al.  Subsequence matching on structured time series data , 2005, SIGMOD '05.

[24]  J. Torriti,et al.  Price-based demand side management: Assessing the impacts of time-of-use tariffs on residential electricity demand and peak shifting in Northern Italy , 2012 .

[25]  Jun Wang,et al.  Generalizing DTW to the multi-dimensional case requires an adaptive approach , 2016, Data Mining and Knowledge Discovery.

[26]  Wolfgang Kastner,et al.  Analysis of Similarity Measures in Times Series Clustering for the Discovery of Building Energy Patterns , 2013 .

[27]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[28]  F. Pukelsheim The Three Sigma Rule , 1994 .

[29]  Yoonsik Tak,et al.  A Leaf Image Retrieval Scheme Based on Partial Dynamic Time Warping and Two-Level Filtering , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[30]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[31]  Christian Carmona,et al.  Model-based approach for household clustering with mixed scale variables , 2016, Advances in Data Analysis and Classification.

[32]  Lin-Shan Lee,et al.  Integrating frame-based and segment-based dynamic time warping for unsupervised spoken term detection with spoken queries , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[34]  Germain Forestier,et al.  Optimizing dynamic time warping’s window width for time series data mining applications , 2018, Data Mining and Knowledge Discovery.

[35]  Eamonn J. Keogh,et al.  Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile , 2017, Data Mining and Knowledge Discovery.

[36]  Lars Schmidt-Thieme,et al.  Learning time-series shapelets , 2014, KDD.

[37]  K. Steemers,et al.  A method of formulating energy load profile for domestic buildings in the UK , 2005 .

[38]  Nashwan Dawood,et al.  Energy profiling in the life‐cycle assessment of buildings , 2010 .

[39]  Xinghuo Yu,et al.  Efficient Computation for Sparse Load Shifting in Demand Side Management , 2017, IEEE Transactions on Smart Grid.

[40]  Nima Amjady,et al.  Short-term hourly load forecasting using time-series modeling with peak load estimation capability , 2001 .

[41]  Jason R. Chen Making subsequence time series clustering meaningful , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[42]  Takumi Ichimura,et al.  Clustering of time series using hybrid symbolic aggregate approximation , 2017, 2017 IEEE Symposium Series on Computational Intelligence (SSCI).

[43]  Gianfranco Chicco,et al.  Automated load pattern learning and anomaly detection for enhancing energy management in smart buildings , 2018, Energy.

[44]  Eamonn J. Keogh,et al.  Time series shapelets: a new primitive for data mining , 2009, KDD.

[45]  Zibin Zheng,et al.  Wide and Deep Convolutional Neural Networks for Electricity-Theft Detection to Secure Smart Grids , 2018, IEEE Transactions on Industrial Informatics.

[46]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[47]  Tingting Guo,et al.  Short-Term Load Forecasting for Electric Power Systems Using the PSO-SVR and FCM Clustering Techniques , 2011 .

[48]  Geoffrey I. Webb,et al.  Proximity Forest: an effective and scalable distance-based classifier for time series , 2018, Data Mining and Knowledge Discovery.

[49]  G. Kitagawa,et al.  Information Criteria and Statistical Modeling , 2007 .

[50]  Eamonn J. Keogh,et al.  Derivative Dynamic Time Warping , 2001, SDM.

[51]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[52]  Germain Forestier,et al.  Constrained distance based clustering for time-series: a comparative and experimental study , 2018, Data Mining and Knowledge Discovery.

[53]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[54]  Jiuyong Li,et al.  An improvement of symbolic aggregate approximation distance measure for time series , 2014, Neurocomputing.

[55]  G. Chicco,et al.  Comparisons among clustering techniques for electricity customer classification , 2006, IEEE Transactions on Power Systems.

[56]  Francisco Martinez Alvarez,et al.  Energy Time Series Forecasting Based on Pattern Sequence Similarity , 2011, IEEE Transactions on Knowledge and Data Engineering.

[57]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[58]  Carlos Agón,et al.  Time-series data mining , 2012, CSUR.

[59]  Anna Scaglione,et al.  Demand-Side Management in the Smart Grid: Information Processing for the Power Switch , 2012, IEEE Signal Processing Magazine.

[60]  Eamonn J. Keogh,et al.  A Complexity-Invariant Distance Measure for Time Series , 2011, SDM.

[61]  Weihui Dai,et al.  Dynamic Time Warping: Itakura vs Sakoe-Chiba , 2019, 2019 IEEE International Symposium on INnovations in Intelligent SysTems and Applications (INISTA).

[62]  Olufemi A. Omitaomu,et al.  Weighted dynamic time warping for time series classification , 2011, Pattern Recognit..

[63]  George Karypis,et al.  Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering , 2004, Machine Learning.

[64]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .