论文信息 - Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation

Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation

Efficient and accurate similarity searching for a large amount of time series data set is an important but non-trivial problem. Many dimensionality reduction techniques have been proposed for effective representation of time series data in order to realize such similarity searching, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), the Adaptive Piecewise Constant Approximation (APCA), and the recently proposed Symbolic Aggregate Approximation (SAX). In this work we propose a new extended approach based on SAX, called Extended SAX in order to realize efficient and accurate discovering of important patterns, necessary for financial applications. While the original SAX approach allows a very good dimensionality reduction and distance measures to be defined on the symbolic approach, SAX is based on PAA (Piecewise Aggregate Approximation) representation for dimensionality reduction that minimizes dimensionality by the mean values of equal sized frames. This value based representation causes a high possibility to miss some important patterns in some time series data such as financial time series data. Extended SAX, proposed in the paper, uses additional two new points, that is, max and min points, in equal sized frames besides the mean value for data approximation. We show that Extended SAX can improve representation preciseness without losing symbolic nature of the original SAX representation. We empirically compare the Extended SAX with the original SAX approach and demonstrate its quality improvement.

[1] Ada Wai-Chee Fu,et al. Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2] Eamonn J. Keogh,et al. An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3] Eamonn J. Keogh,et al. A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[4] Christos Faloutsos,et al. Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[5] Christos Faloutsos,et al. Efficient Similarity Search In Sequence Databases , 1993, FODO.

[6] Eamonn J. Keogh,et al. Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[7] Eamonn J. Keogh,et al. Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.