Efficient Similarity Search over Future Stream Time Series

With the advance of hardware and communication technologies, stream time series is gaining ever-increasing attention due to its importance in many applications such as financial data processing, network monitoring, Web click-stream analysis, sensor data mining, and anomaly detection. For all of these applications, an efficient and effective similarity search over stream data is essential. Because of the unique characteristics of the stream, for example, data are frequently updated and real-time response is required, the previous approaches proposed for searching through archived data may not work in the stream scenarios. Especially, in the cases where data often arrive periodically for various reasons (for example, the communication congestion or batch processing), queries on such incomplete time series or even future time series may result in inaccuracy using traditional approaches. Therefore, in this paper, we propose three approaches, polynomial, Discrete Fourier Transform (DFT), and probabilistic, to predict the unknown values that have not arrived at the system and answer similarity queries based on the predicted data. We also apply efficient indexes, that is, a multidimensional hash index and a B+-tree, to facilitate the prediction and similarity search on future time series, respectively. Extensive experiments demonstrate the efficiency and effectiveness of our methods for prediction and answering queries.

[1]  Dimitrios Gunopulos,et al.  Discovering similar multidimensional trajectories , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Eamonn Keogh Exact Indexing of Dynamic Time Warping , 2002, VLDB.

[3]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[4]  David J. C. MacKay,et al.  Bayesian Interpolation , 1992, Neural Computation.

[5]  Donald J. Berndt,et al.  Finding Patterns in Time Series: A Dynamic Programming Approach , 1996, Advances in Knowledge Discovery and Data Mining.

[6]  Raymond T. Ng,et al.  Indexing spatio-temporal trajectories with Chebyshev polynomials , 2004, SIGMOD '04.

[7]  D. Heckerman,et al.  Autoregressive Tree Models for Time-Series Analysis , 2002, SDM.

[8]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[9]  Wesley W. Chu,et al.  An index-based approach for similarity search supporting time warping in large sequence databases , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Zhengrong Yao,et al.  Evaluating continuous nearest neighbor queries for streaming time series via pre-fetching , 2002, CIKM '02.

[11]  Apostolos N. Papadopoulos,et al.  Efficient similarity search in streaming time sequences , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[12]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[13]  Donghui Zhang,et al.  Online event-driven subsequence matching over financial data streams , 2004, SIGMOD '04.

[14]  John S. Boreczky,et al.  Comparison of video shot boundary detection techniques , 1996, J. Electronic Imaging.

[15]  Lei Chen,et al.  Robust and fast similarity search for moving object trajectories , 2005, SIGMOD '05.

[16]  Changzhou Wang,et al.  Supporting Movement Pattern Queries in User-Specified Scales , 2003, IEEE Trans. Knowl. Data Eng..

[17]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[18]  Yufei Tao,et al.  Reverse kNN Search in Arbitrary Dimensionality , 2004, VLDB.

[19]  Eamonn J. Keogh,et al.  Finding surprising patterns in a time series database in linear time and space , 2002, KDD.

[20]  Isak Gath,et al.  Unsupervised Optimal Fuzzy Clustering , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Ambuj K. Singh,et al.  A unified framework for monitoring data streams in real time , 2005, 21st International Conference on Data Engineering (ICDE'05).

[22]  Amir B. Geva,et al.  A new algorithm for time series prediction by temporal fuzzy clustering , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[23]  László Györfi,et al.  A simple randomized algorithm for sequential prediction of ergodic time series , 1999, IEEE Trans. Inf. Theory.

[24]  Intaek Kim,et al.  A fuzzy time series prediction method based on consecutive values , 1999, FUZZ-IEEE'99. 1999 IEEE International Fuzzy Systems. Conference Proceedings (Cat. No.99CH36315).

[25]  M. Moraud Wavelet Networks , 2018, Foundations of Wavelet Networks and Applications.

[26]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[27]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[28]  Geoffrey C. Fox,et al.  A deterministic annealing approach to clustering , 1990, Pattern Recognit. Lett..

[29]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[30]  Johan A. K. Suykens,et al.  Financial time series prediction using least squares support vector machines within the evidence framework , 2001, IEEE Trans. Neural Networks.

[31]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[32]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[33]  Zhiping Lin,et al.  Predicting time series with wavelet packet neural networks , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[34]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[35]  Shu Lin,et al.  An Extendible Hash for Multi-Precision Similarity Querying of Image Databases , 2001, VLDB.

[36]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[37]  Ricardo Vilalta,et al.  Predicting rare events in temporal domains , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[38]  Dennis Shasha,et al.  Warping indexes with envelope transforms for query by humming , 2003, SIGMOD '03.

[39]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[40]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[41]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[42]  Yunhao Liu,et al.  Contour map matching for event detection in sensor networks , 2006, SIGMOD Conference.

[43]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[44]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[45]  Yun-tao Qian,et al.  Markov model based time series similarity measuring , 2003, Proceedings of the 2003 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.03EX693).

[46]  Lei Chen,et al.  On The Marriage of Lp-norms and Edit Distance , 2004, VLDB.

[47]  Like Gao,et al.  Continually evaluating similarity-based pattern queries on a streaming time series , 2002, SIGMOD '02.

[48]  Yufei Tao,et al.  Multidimensional reverse kNN search , 2007, The VLDB Journal.

[49]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.