Financial time series indexing based on low resolution clustering

‡ Corresponding author Abstract One of the major tasks in time series database application is time series query. Time series data is always exist in large data size and high dimensionality. However, different from traditional data, it is impossible to index the time series in traditional database system. Moreover, time series with different lengths always coexists in the same database. Therefore, development of a time series indexing approach is of fundamental importance for maintaining an acceptable speed for time series query. By identifying the perceptually important points (PIPs) from the time domain, time series of different lengths can be compared and the dimensionality of the time series can be greatly reduced. In this paper, a time series indexing approach, based on clustering the time series data in low resolution, is proposed. This approach is customized for stock time series to cater for its unique behaviors. It follows the time domain approach to carry out the indexing process which is intuitive to ordinary data analysts. One may find it particularly attractive in applications like stock data analysis. The proposed approach is efficient and effective as well. As demonstrated by the experiments, the proposed approach speeds up the time series query process while it also guarantees no false dismissals. In addition, the proposed approach can handle the problem of updating new entries to the database without any difficulty.

[1]  Man Hon Wong,et al.  Fast time-series searching with scaling and shifting , 1999, PODS '99.

[2]  Divyakant Agrawal,et al.  A comparison of DFT and DWT based similarity search in time-series databases , 2000, CIKM '00.

[3]  David H. Douglas,et al.  ALGORITHMS FOR THE REDUCTION OF THE NUMBER OF POINTS REQUIRED TO REPRESENT A DIGITIZED LINE OR ITS CARICATURE , 1973 .

[4]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[5]  Eugene Fink,et al.  Search for Patterns in Compressed Time Series , 2002, Int. J. Image Graph..

[6]  Changzhou Wang,et al.  Supporting content-based searches on time series via approximation , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[7]  Henrik André-Jönsson,et al.  Using Signature Files for Querying Time-Series Data , 1997, PKDD.

[8]  Ambuj K. Singh,et al.  Variable length queries for time series data , 2001, Proceedings 17th International Conference on Data Engineering.

[9]  Tak-chung Fu,et al.  Flexible time series pattern matching based on perceptually important points , 2001 .

[10]  Davood Rafiei,et al.  On similarity-based queries for time series data , 1997, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[11]  Eamonn J. Keogh,et al.  An indexing scheme for fast similarity search in large time series databases , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[12]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[13]  Christos Faloutsos,et al.  Fast Time Sequence Indexing for Arbitrary Lp Norms , 2000, VLDB.

[14]  Renée J. Miller,et al.  Similarity search over time-series data using wavelets , 2002, Proceedings 18th International Conference on Data Engineering.

[15]  Paul S. Bradley,et al.  Initialization of Iterative Refinement Clustering Algorithms , 1998, KDD.

[16]  Alberto O. Mendelzon,et al.  Querying Time Series Data Based on Similarity , 2000, IEEE Trans. Knowl. Data Eng..

[17]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[18]  Haixun Wang,et al.  Landmarks: a new model for similarity-based pattern querying in time series databases , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[19]  Tak-chung Fu,et al.  A specialized binary tree for financial time series representation , 2004 .

[20]  Enrique H. Ruspini,et al.  Qualitative Object Description : Initial Reports of the Exploration of the Frontier , 1999 .

[21]  Eamonn J. Keogh,et al.  On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration , 2002, Data Mining and Knowledge Discovery.

[22]  Eugene Fink,et al.  Indexing of Compressed Time Series , 2004 .

[23]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[24]  Dimitrios Gunopulos,et al.  Finding Similar Time Series , 1997, PKDD.

[25]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[26]  J. Hershberger,et al.  Speeding Up the Douglas-Peucker Line-Simplification Algorithm , 1992 .

[27]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[28]  Eamonn J. Keogh,et al.  Exact indexing of dynamic time warping , 2002, Knowledge and Information Systems.

[29]  Divyakant Agrawal,et al.  Approximate nearest neighbor searching in multimedia databases , 2001, Proceedings 17th International Conference on Data Engineering.

[30]  Kristin P. Bennett,et al.  Density-based indexing for approximate nearest-neighbor queries , 1999, KDD '99.

[31]  Christos Faloutsos,et al.  Efficient retrieval of similar time sequences under time warping , 1998, Proceedings 14th International Conference on Data Engineering.

[32]  Philip S. Yu,et al.  Adaptive query processing for time-series data , 1999, KDD '99.

[33]  Chris H. Q. Ding,et al.  Adaptive dimension reduction for clustering high dimensional data , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[34]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[35]  Padhraic Smyth,et al.  Deformable Markov model templates for time-series pattern matching , 2000, KDD '00.