Time Series Indexing Taking Advantage of the Generalized Suffix Tree

A time series is a collection of observations made sequentially over time. Time series appear in several application areas such as finance, marketing, agriculture, weather, industrial and scientific data gathering. Similarity searching on time series databases is an important tool to extract knowledge from them. In this article, we propose Telesto, a novel indexing approach aimed at performing similarity search over time series, which is based on discretized time series and generalized suffix trees. Initially, Telesto discretizes time series and represents them as strings, using as a basis the Symbolic Aggregate Approximation (SAX) technique. Thereafter, these strings are indexed using a generalized suffix tree. To provide both range and nearest neighbor query operations among discretized time series, Telesto extends the suffix tree substring search algorithm by calculating distances between the discretized time series. Performance tests showed that Telesto is scalable in response to increasing sizes of databases and queries, in addition to be very efficient in similarity queries over large real-world time series databases.

[1]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Li Wei,et al.  Experiencing SAX: a novel symbolic representation of time series , 2007, Data Mining and Knowledge Discovery.

[3]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[4]  Pierre Geurts,et al.  Pattern Extraction for Time Series Classification , 2001, PKDD.

[5]  Jianmin Wang,et al.  Rules Discovery from Cross-Sectional Short-Length Time Series , 2004, PAKDD.

[6]  Konstantinos Kalpakis,et al.  Distance measures for effective clustering of ARIMA time-series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[8]  J. A. Schell,et al.  Monitoring vegetation systems in the great plains with ERTS , 1973 .

[9]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[10]  Eamonn J. Keogh,et al.  Visualizing and Discovering Non-Trivial Patterns in Large Time Series Databases , 2005, Inf. Vis..

[11]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[12]  Dina Q. Goldin,et al.  On Similarity Queries for Time-Series Data: Constraint Specification and Implementation , 1995, CP.

[13]  Heikki Mannila,et al.  Rule Discovery from Time Series , 1998, KDD.

[14]  Hui Ding,et al.  Querying and mining of time series data: experimental comparison of representations and distance measures , 2008, Proc. VLDB Endow..

[15]  Mohammed Al-Shalalfa,et al.  Efficient Periodicity Mining in Time Series Databases Using Suffix Trees , 2011, IEEE Transactions on Knowledge and Data Engineering.

[16]  Agma J. M. Traina,et al.  New DTW-based method to similarity search in sugar cane regions represented by climate and remote sensing time series , 2010, 2010 IEEE International Geoscience and Remote Sensing Symposium.

[17]  Dan Gusfield Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .