TR-SVD: Fast and Memory Efficient Method for Time Ranged Singular Value Decomposition

Given multiple time series data, how can we efficiently find latent patterns in an arbitrary time range? Singular value decomposition (SVD) is a crucial tool to discover hidden factors in multiple time series data, and has been used in many data mining applications including dimensionality reduction, principal component analysis, recommender systems, etc. Along with its static version, incremental SVD has been used to deal with multiple semi-infinite time series data and to identify patterns of the data. However, existing SVD methods for the multiple time series data analysis do not provide functionality for detecting patterns of data in an arbitrary time range: standard SVD requires data for all intervals corresponding to a time range query, and incremental SVD does not consider an arbitrary time range. In this paper, we propose TR-SVD (Time Ranged Singular Value Decomposition), a fast and memory efficient method for finding latent factors of time series data in an arbitrary time range. TR-SVD incrementally compresses multiple time series data block by block to reduce the space cost in storage phase, and efficiently computes singular value decomposition (SVD) for a given time range query in query phase by carefully stitching stored SVD results. Through extensive experiments, we demonstrate that TR-SVD is up to 15 x faster, and requires 15 x less space than existing methods. Our case study shows that TR-SVD is useful for capturing past time ranges whose patterns are similar to a query time range.

[1]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[2]  Christos Faloutsos,et al.  DynaMMo: mining and summarization of coevolving sequences with missing values , 2009, KDD.

[3]  Jimeng Sun,et al.  Fast Random Walk Graph Kernel , 2012, SDM.

[4]  Jie Liu,et al.  GAMPS: compressing multi sensor data by grouping and amplitude scaling , 2009, SIGMOD Conference.

[5]  Christos Faloutsos,et al.  HaTen2: Billion-scale tensor decompositions , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[6]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[7]  Machiko Toyoda,et al.  Pattern discovery in data streams under the time warping distance , 2012, The VLDB Journal.

[8]  Jie Liu,et al.  Fast approximate correlation for massive time-series data , 2010, SIGMOD Conference.

[9]  Luis Mateus Rocha,et al.  Singular value decomposition and principal component analysis , 2003 .

[10]  Matthew Brand,et al.  Fast Online SVD Revisions for Lightweight Recommender Systems , 2003, SDM.

[11]  Lee Sael,et al.  SCouT: Scalable coupled matrix-tensor factorization - algorithm and discoveries , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[12]  M. A. Iwen,et al.  A Distributed and Incremental SVD Algorithm for Agglomerative Data Analysis on Large Networks , 2016, SIAM J. Matrix Anal. Appl..

[13]  Dawid Weiss,et al.  Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition , 2004, Intelligent Information Systems.

[14]  Philip S. Yu,et al.  Optimal multi-scale patterns in time series streams , 2006, SIGMOD Conference.

[15]  Ambuj K. Singh,et al.  Dimensionality Reduction for Similarity Searching in Dynamic Databases , 1999, Comput. Vis. Image Underst..

[16]  Lee Sael,et al.  Scalable Tucker Factorization for Sparse Tensors - Algorithms and Discoveries , 2017, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[17]  Edward Y. Chang,et al.  Adaptive stream resource management using Kalman Filters , 2004, SIGMOD '04.

[18]  Christos Faloutsos,et al.  Spectral Analysis for Billion-Scale Graphs: Discoveries and Implementation , 2011, PAKDD.

[19]  Sahin Albayrak,et al.  Pattern recognition and classification for multivariate time series , 2011, SensorKDD '11.

[20]  Christos Faloutsos,et al.  Parsimonious linear fingerprinting for time series , 2010, Proc. VLDB Endow..

[21]  Krzysztof Fujarewicz,et al.  Using SVD and SVM methods for selection, classification, clustering and modeling of DNA microarray data , 2004, Eng. Appl. Artif. Intell..

[22]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2015, SIGMOD Conference.

[23]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[24]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[25]  Matthew Brand,et al.  Incremental Singular Value Decomposition of Uncertain Data with Missing Values , 2002, ECCV.

[26]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[27]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[28]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[29]  Christos Faloutsos,et al.  Mining billion-scale tensors: algorithms and discoveries , 2016, The VLDB Journal.

[30]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[31]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[32]  G. Karypis,et al.  Incremental Singular Value Decomposition Algorithms for Highly Scalable Recommender Systems , 2002 .

[33]  Sadique Sheik,et al.  Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring , 2015 .

[34]  Matei Zaharia,et al.  Matrix Computations and Optimization in Apache Spark , 2015, KDD.

[35]  Christos Faloutsos,et al.  HEigen: Spectral Analysis for Billion-Scale Graphs , 2014, IEEE Transactions on Knowledge and Data Engineering.

[36]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[37]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[38]  Suman Nath,et al.  Cypress : Managing Massive Time Series Streams with Multi-Scale Compressed Trickles , 2009 .