DRSP : Dimension Reduction For Similarity Matching And Pruning Of Time Series Data Streams

Similarity matching and join of time series data streams has gained a lot of relevance in today’s world that has large streaming data. This process finds wide scale application in the areas of location tracking, sensor networks, object positioning and monitoring to name a few. However, as the size of the data stream increases, the cost involved to retain all the data in order to aid the process of similarity matching also increases. We develop a novel framework to addresses the following objectives. Firstly, Dimension reduction is performed in the preprocessing stage, where large stream data is segmented and reduced into a compact representation such that it retains all the crucial information by a technique called Multi-level Segment Means (MSM). This reduces the space complexity associated with the storage of large time-series data streams. Secondly, it incorporates effective Similarity Matching technique to analyze if the new data objects are symmetric to the existing data stream. And finally, the Pruning Technique that filters out the pseudo data object pairs and join only the relevant pairs. The computational cost for MSM is O(l*ni) and the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction Factor. We have performed exhaustive experimental trials to show that the proposed framework is both efficient and competent in comparison with earlier works.

[1]  Young-Koo Lee,et al.  An Efficient Candidate Pruning Technique for High Utility Pattern Mining , 2009, PAKDD.

[2]  Li Ai Dimensionality Reduction and Similarity Search in Large Time Series Databases , 2005 .

[3]  Ada Wai-Chee Fu,et al.  Efficient time series matching by wavelets , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4]  Christos Faloutsos,et al.  Efficient Similarity Search In Sequence Databases , 1993, FODO.

[5]  Minyoung Kim Time-Series Dimensionality Reduction via Granger Causality , 2012, IEEE Signal Processing Letters.

[6]  Di Chen,et al.  Wavelet-Based Data Reduction Techniques for Process Fault Detection , 2006, Technometrics.

[7]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[8]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[9]  Ambuj K. Singh,et al.  Dimensionality reduction for similarity searching in dynamic databases , 1998, SIGMOD '98.

[10]  Jae-Yearn Kim,et al.  Data Reduction for Instance-Based Learning Using Entropy-Based Partitioning , 2006, ICCSA.

[11]  Lei Chen,et al.  Similarity Join Processing on Uncertain Data Streams , 2011, IEEE Transactions on Knowledge and Data Engineering.

[12]  Durga Toshniwal,et al.  Similarity Search in Time Series Data Using Time Weighted Slopes , 2005, Informatica.

[13]  J. Edward Jackson,et al.  A User's Guide to Principal Components. , 1991 .

[14]  D. Rajalakshmi,et al.  Efficient and Fast Pattern Matching in Stream Time Series Image Data , 2010, 2010 First International Conference on Integrated Intelligent Computing.

[15]  Samuel Kaski,et al.  Dimensionality reduction by random mapping: fast similarity computation for clustering , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[16]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[17]  APST: Approximation and Prediction of Stock Time-Series Data using Pattern Sequence , 2013 .

[18]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[19]  Clu-istos Foutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[20]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[21]  Xiang Lian,et al.  Multiscale Representations for Fast Pattern Matching in Stream Time Series , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  Tripti Negi,et al.  Time Series : Similarity Search and its Applications , 2004 .

[23]  Duong Tuan Anh,et al.  An Improvement of PIP for Time Series Dimensionality Reduction and Its Index Structure , 2010, 2010 Second International Conference on Knowledge and Systems Engineering.