相关论文

Extended SAX: Extension of Symbolic Aggregate Approximation for Financial Time Series Data Representation

Abstract:Efficient and accurate similarity searching for a large amount of time series data set is an important but non-trivial problem. Many dimensionality reduction techniques have been proposed for effective representation of time series data in order to realize such similarity searching, including Singular Value Decomposition (SVD), the Discrete Fourier transform (DFT), the Adaptive Piecewise Constant Approximation (APCA), and the recently proposed Symbolic Aggregate Approximation (SAX). In this work we propose a new extended approach based on SAX, called Extended SAX in order to realize efficient and accurate discovering of important patterns, necessary for financial applications. While the original SAX approach allows a very good dimensionality reduction and distance measures to be defined on the symbolic approach, SAX is based on PAA (Piecewise Aggregate Approximation) representation for dimensionality reduction that minimizes dimensionality by the mean values of equal sized frames. This value based representation causes a high possibility to miss some important patterns in some time series data such as financial time series data. Extended SAX, proposed in the paper, uses additional two new points, that is, max and min points, in equal sized frames besides the mean value for data approximation. We show that Extended SAX can improve representation preciseness without losing symbolic nature of the original SAX representation. We empirically compare the Extended SAX with the original SAX approach and demonstrate its quality improvement.

参考文献

[1]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[2]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[4]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[5]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[6]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[7]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

引用
Data Management and Analytics for Medicine and Healthcare
Lecture Notes in Computer Science
2017
Advanced Discretisation and Visualisation Methods for Performance Profiling of Wind Turbines
Energies
2021
PSEUDo: Interactive Pattern Search in Multivariate Time Series with Locality-Sensitive Hashing and Relevance Feedback
IEEE Transactions on Visualization and Computer Graphics
2021
Time series visualization based on shape features
Knowl. Based Syst.
2013
A Review on Time Series Dimensionality Reduction
2018
A multi-breakpoints approach for symbolic discretization of time series
Knowledge and Information Systems
2020
A novel Bayesian and Chain Rule Model on symbolic representation for time series classification
2016 IEEE International Conference on Automation Science and Engineering (CASE)
2016
Extreme-SAX: Extreme Points Based Symbolic Representation for Time Series Classification
DaWaK
2020
Survey of Methods for Time Series Symbolic Aggregate Approximation
ICPCSEE
2019
Efficient Classification of Long Time Series by 3-D Dynamic Time Warping
IEEE Transactions on Systems, Man, and Cybernetics: Systems
2017
Computational intelligence for analysis concerning financial modelling and the adaptive market hypothesis
2012
Introduction of Item Constraints to Discover Characteristic Sequential Patterns
Emerging Perspectives in Big Data Warehousing
2019
Discovery of Corrosion Patterns using Symbolic Time Series Representation and N-gram Model
2018
A novel multisensoric system recording and analyzing human biometric features for biometric and biomedical applications
2012
Computational Methods for the Integrated Deterministic and Probabilistic Safety Assessment of a Simplified Cooling Circuit for a Tokamak Superconducting Magnet
2018
BEATS: Blocks of Eigenvalues Algorithm for Time Series Segmentation
IEEE Transactions on Knowledge and Data Engineering
2018
Data-driven Kernel-based Probabilistic SAX for Time Series Dimensionality Reduction
2020 28th European Signal Processing Conference (EUSIPCO)
2021
A novel symbolization technique for time-series outlier detection
2015 IEEE International Conference on Big Data (Big Data)
2015
Towards Optimal Symbolization for Time Series Comparisons
2013 IEEE 13th International Conference on Data Mining Workshops
2013
Grasp heuristic for time series compression with piecewise aggregate approximation
RAIRO Oper. Res.
2019