Sprintz

Thanks to the rapid proliferation of connected devices, sensor-generated time series constitute a large and growing portion of the world's data. Often, this data is collected from distributed, resource-constrained devices and centralized at one or more servers. A key challenge in this setup is reducing the size of the transmitted data without sacrificing its quality. Lower quality reduces the data's utility, but smaller size enables both reduced network and storage costs at the servers and reduced power consumption in sensing devices. A natural solution is to compress the data at the sensing devices. Unfortunately, existing compression algorithms either violate the memory and latency constraints common for these devices or, as we show experimentally, perform poorly on sensor-generated time series. We introduce a time series compression algorithm that achieves state-of-the-art compression ratios while requiring less than 1KB of memory and adding virtually no latency. This method is suitable not only for low-power devices collecting data, but also for servers storing and querying data; in the latter context, it can decompress at over 3GB/s in a single thread, even faster than many algorithms with much lower compression ratios. A key component of our method is a high-speed forecasting algorithm that can be trained online and significantly outperforms alternatives such as delta coding. Extensive experiments on datasets from many domains show that these results hold not only for sensor data but also across a wide array of other time series.

[1]  Thomas Boutell,et al.  PNG (Portable Network Graphics) Specification Version 1.0 , 1997, RFC.

[2]  Klemens Böhm,et al.  A time-series compression technique and its application to the smart grid , 2014, The VLDB Journal.

[3]  Fred Popowich,et al.  Electricity, water, and natural gas consumption of a residential house in Canada from 2012 to 2014 , 2016, Scientific Data.

[4]  S. Golomb Run-length encodings. , 1966 .

[5]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[6]  John V. Guttag,et al.  EXTRACT: Strong Examples from Weakly-Labeled Sensor Data , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[7]  Shaou-Gang Miaou,et al.  Multichannel ECG compression using multichannel adaptive vector quantization , 2001, IEEE Trans. Biomed. Eng..

[8]  Didier Stricker,et al.  Towards global aerobic activity monitoring , 2011, PETRA '11.

[9]  Sabine Van Huffel,et al.  Compressed Sensing of Multichannel EEG Signals: The Simultaneous Cosparsity and Low-Rank Optimization , 2015, IEEE Transactions on Biomedical Engineering.

[10]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[11]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.

[12]  Jyrki Alakuijala,et al.  Brotli Compressed Data Format , 2016, RFC.

[13]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[14]  Karl Aberer,et al.  An Evaluation of Model-Based Approaches to Sensor Data Compression , 2013, IEEE Transactions on Knowledge and Data Engineering.

[15]  Eamonn J. Keogh,et al.  An online algorithm for segmenting time series , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Eamonn J. Keogh,et al.  Fast Shapelets: A Scalable Algorithm for Discovering Time Series Shapelets , 2013, SDM.

[17]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[18]  Aladdin Enterprises,et al.  ZLIB Compressed Data Format Specification version 3.3 , 1996 .

[19]  Leonid Boytsov,et al.  Decoding billions of integers per second through vectorization , 2012, Softw. Pract. Exp..

[20]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[21]  Alex Pentland,et al.  Social fMRI: Investigating and shaping social mechanisms in the real world , 2011, Pervasive Mob. Comput..

[22]  Eric Wang,et al.  LittleTable: A Time-Series Database and Its Uses , 2017, SIGMOD Conference.

[23]  Jordan M. Malof,et al.  Distributed solar photovoltaic array location and extent dataset for remote sensing object identification , 2016, Scientific Data.

[24]  Sergey Malinchik,et al.  SAX-VSM: Interpretable Time Series Classification Using SAX and Vector Space Model , 2013, 2013 IEEE 13th International Conference on Data Mining.

[25]  Eamonn J. Keogh,et al.  Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data , 2011, 2011 IEEE 11th International Conference on Data Mining.

[26]  Marcin Zukowski,et al.  Super-Scalar RAM-CPU Cache Compression , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[27]  John Lach,et al.  Adaptive lossless compression in wireless body sensor networks , 2009, BODYNETS.

[28]  Daniel Lemire,et al.  A Better Alternative to Piecewise Linear Time Series Segmentation , 2006, SDM.

[29]  Naveen Verma,et al.  A Micro-Power EEG Acquisition SoC With Integrated Feature Extraction Processor for a Chronic Seizure Detection System , 2010, IEEE Journal of Solid-State Circuits.

[30]  Eamonn J. Keogh,et al.  iSAX: disk-aware mining and indexing of massive time series datasets , 2009, Data Mining and Knowledge Discovery.

[31]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[32]  Timothy B. Terriberry,et al.  Definition of the Opus Audio Codec , 2012, RFC.

[33]  Wolfgang Lehner,et al.  Fast integer compression using SIMD instructions , 2010, DaMoN '10.

[34]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[35]  Benton H. Calhoun,et al.  Body Area Sensor Networks: Challenges and Opportunities , 2009, Computer.

[36]  Robert F. Rice Some practical universal noiseless coding techniques, part 3, module PSl14,K+ , 1991 .

[37]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[38]  Alexander A. Stepanov,et al.  SIMD-based decoding of posting lists , 2011, CIKM '11.

[39]  Anthony Rowe,et al.  Respawn: A Distributed Multi-resolution Time-Series Datastore , 2013, 2013 IEEE 34th Real-Time Systems Symposium.

[40]  Hongfei Yan,et al.  A General SIMD-Based Approach to Accelerating Compression Algorithms , 2015, TOIS.

[41]  Martin Fuchs,et al.  High Speed Lossless Image Compression , 2015, GCPR.

[42]  Helena M. Mentis,et al.  Instructing people for training gestural interactive systems , 2012, CHI.

[43]  Eamonn J. Keogh,et al.  iSAX 2.0: Indexing and Mining One Billion Time Series , 2010, 2010 IEEE International Conference on Data Mining.

[44]  David E. Culler,et al.  BTrDB: Optimizing Storage System Design for Timeseries Processing , 2016, FAST.

[45]  Sadique Sheik,et al.  Reservoir computing compensates slow response of chemosensor arrays exposed to fast varying gas concentrations in continuous monitoring , 2015 .

[46]  Alistair Moffat,et al.  Index compression using 64‐bit words , 2010, Softw. Pract. Exp..

[47]  Deep Ganguli,et al.  Druid: a real-time analytical data store , 2014, SIGMOD Conference.

[48]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[49]  Eamonn J. Keogh,et al.  HOT SAX: efficiently finding the most unusual time series subsequence , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[50]  T. Robinson Simple Lossless and Near-lossless Waveform Compression , 1994 .

[51]  Bernhard Seeger,et al.  ChronicleDB: A High-Performance Event Store , 2017, EDBT.

[52]  Soma Bandyopadhyay,et al.  Signal Characteristics on Sensor Data Compression in IoT -An Investigation , 2016, 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[53]  Qi Huang,et al.  Gorilla: A Fast, Scalable, In-Memory Time Series Database , 2015, Proc. VLDB Endow..

[54]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[55]  Arijit Ukil,et al.  IoT Data Compression: Sensor-Agnostic Approach , 2015, 2015 Data Compression Conference.

[56]  Jarek Duda,et al.  Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding , 2013, 1311.2540.

[57]  Eamonn J. Keogh,et al.  Discovering the Intrinsic Cardinality and Dimensionality of Time Series Using MDL , 2011, 2011 IEEE 11th International Conference on Data Mining.