KV-Match: A Subsequence Matching Approach Supporting Normalization and Time Warping

The volume of time series data has exploded due to the popularity of new applications, such as data center management and IoT. Subsequence matching is a fundamental task in mining time series data. All index-based approaches only consider raw subsequence matching (RSM) and do not support subsequence normalization. UCR Suite can deal with normalized subsequence matching problem (NSM), but it needs to scan full time series. In this paper, we propose a novel problem, named constrained normalized subsequence matching problem (cNSM), which adds some constraints to NSM problem. The cNSM problem provides a knob to flexibly control the degree of offset shifting and amplitude scaling, which enables users to build the index to process the query. We propose a new index structure, KV-index, and the matching algorithm, KV-match. With a single index, our approach can support both RSM and cNSM problems under either ED or DTW distance. KV-index is a key-value structure, which can be easily implemented on local files or HBase tables. To support the query of arbitrary lengths, we extend KV-match to KV-match_DP, which utilizes multiple varied-length indexes to process the query. We conduct extensive experiments on synthetic and real-world datasets. The results verify the effectiveness and efficiency of our approach.

[1]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[2]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[3]  Yang-Sae Moon,et al.  Duality-based subsequence matching in time-series databases , 2001, Proceedings 17th International Conference on Data Engineering.

[4]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[5]  Jianzhong Li,et al.  Set-based Similarity Search for Time Series , 2016, SIGMOD Conference.

[6]  Ulf Leser,et al.  Fast and Accurate Time Series Classification with WEASEL , 2017, CIKM.

[7]  Man Lung Yiu,et al.  Fast Subsequence Search on Time Series Data , 2017, EDBT.

[8]  Themis Palpanas,et al.  Indexing for interactive exploration of big data series , 2014, SIGMOD Conference.

[9]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[10]  Elke A. Rundensteiner,et al.  Interactive Time Series Exploration Powered by the Marriage of Similarity Distances , 2016, Proc. VLDB Endow..

[11]  Eamonn J. Keogh,et al.  iSAX 2.0: Indexing and Mining One Billion Time Series , 2010, 2010 IEEE International Conference on Data Mining.

[12]  Yang-Sae Moon,et al.  General match: a subsequence matching method in time-series databases based on generalized windows , 2002, SIGMOD '02.

[13]  Eamonn J. Keogh,et al.  Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile , 2017, Data Mining and Knowledge Discovery.

[14]  Sang-Wook Kim,et al.  Using multiple indexes for efficient subsequence matching in time-series databases , 2006, Inf. Sci..

[15]  Luis Gravano,et al.  Fast and Accurate Time-Series Clustering , 2017, ACM Trans. Database Syst..

[16]  Dimitrios Gunopulos,et al.  Embedding-based subsequence matching in time-series databases , 2011, TODS.

[17]  Haifeng Jiang,et al.  Ranked Subsequence Matching in Time-Series Databases , 2007, VLDB.

[18]  George Kollios,et al.  A Generic Framework for Efficient and Effective Subsequence Retrieval , 2012, Proc. VLDB Endow..

[19]  Hanan Samet,et al.  Execution time analysis of a top-down R-tree construction algorithm , 2007, Inf. Process. Lett..

[20]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[21]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[22]  Eamonn J. Keogh,et al.  Scaling and time warping in time series querying , 2005, The VLDB Journal.

[23]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[24]  Elke A. Rundensteiner,et al.  Generalized Dynamic Time Warping: Unleashing the Warping Power Hidden in Point-Wise Distances , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).