Cypress : Managing Massive Time Series Streams with Multi-Scale Compressed Trickles

We present Cypress, a novel framework to archive and query massive time series streams such as those generated by sensor networks, data centers, and scientific computing. Cypress applies multi-scale analysis to decompose time series and to obtain sparse representations in various domains (e.g. frequency domain and time domain). Relying on the sparsity, the time series streams can be archived with reduced storage space. We then show that many statistical queries such as trend, histogram and correlations can be answered directly from compressed data rather than from reconstructed raw data. Our evaluation with server utilization data collected from real data centers shows significant benefit of our framework.

[1]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[2]  Piotr Indyk,et al.  Identifying Representative Trends in Massive Time Series Data Sets Using Sketches , 2000, VLDB.

[3]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[4]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[5]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[6]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[7]  Sudipto Guha,et al.  Dynamic multidimensional histograms , 2002, SIGMOD '02.

[8]  R Agrawal,et al.  Fast mining of massive tabular data via approximate distance computations , 2002 .

[9]  Marcos K. Aguilera,et al.  Performance debugging for distributed systems of black boxes , 2003, SOSP '03.

[10]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[11]  S. Muthukrishnan,et al.  One-Pass Wavelet Decompositions of Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[12]  Dimitrios Gunopulos,et al.  Indexing Multidimensional Time-Series , 2004, The VLDB Journal.

[13]  Jimeng Sun,et al.  Streaming Pattern Discovery in Multiple Time-Series , 2005, VLDB.

[14]  Christos Faloutsos,et al.  BRAID: stream mining through group lag correlations , 2005, SIGMOD '05.

[15]  Richard Cole,et al.  Fast window correlations over uncooperative time series , 2005, KDD '05.

[16]  Ying Xing,et al.  The Design of the Borealis Stream Processing Engine , 2005, CIDR.

[17]  Robert D. Nowak,et al.  Signal Reconstruction From Noisy Random Projections , 2006, IEEE Transactions on Information Theory.

[18]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[19]  Sanjay Chawla,et al.  An incremental data-stream sketch using sparse random projections , 2007, SDM.

[20]  Wolf-Dietrich Weber,et al.  Power provisioning for a warehouse-sized computer , 2007, ISCA '07.

[21]  Suman Nath,et al.  Energy-Aware Server Provisioning and Load Dispatching for Connection-Intensive Internet Services , 2008, NSDI.

[22]  Samuel Madden,et al.  Querying continuous functions in a database system , 2008, SIGMOD Conference.

[23]  Andrea Montanari,et al.  Counter braids: a novel counter architecture for per-flow measurement , 2008, SIGMETRICS '08.

[24]  Eamonn J. Keogh,et al.  iSAX: indexing and mining terabyte sized time series , 2008, KDD.

[25]  Jie Liu,et al.  GAMPS: compressing multi sensor data by grouping and amplitude scaling , 2009, SIGMOD Conference.