SIRCS: Slope-intercept-residual Compression by Correlation Sequencing for Multi-stream High Variation Data

Multi-stream data with high variation is ubiquitous in the modern network systems. With the development of telecommunication technologies, robust data compression techniques are urged to be developed. In this paper, we humbly introduce a novel technique specifically for high variation signal data: SIRCS, which applies linear regression model for slope, intercept and residual decomposition of the multi data stream and combines the advanced tree mapping techniques. SIRCS inherits the advantages from the existing grouping compression algorithms, like GAMPS. With the newly invented correlation sorting techniques: the correlation tree mapping, SIRCS can practically improve the compression ratio by 13% from the traditional clustering mapping scheme. The application of the linear model decomposition can further facilitate the improvement of the algorithm performance from the state-of-art algorithms, with the RMSE decrease 4% and the compression time dramatically drop compared to the GAMPS. With the wide range of the error tolerance from 1% to 27%, SIRCS performs consistently better than all evaluated state-of-art algorithms regarding compression efficiency and accuracy.

[1]  Nisheeth Shrivastava,et al.  Space Efficient Streaming Algorithms for the Maximum Error Histogram , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[2]  Ian H. Witten,et al.  Data Compression Using Adaptive Coding and Partial String Matching , 1984, IEEE Trans. Commun..

[3]  Raimon Jané,et al.  Index for estimation of muscle force from mechanomyography based on the Lempel-Ziv algorithm. , 2013, Journal of electromyography and kinesiology : official journal of the International Society of Electrophysiological Kinesiology.

[4]  P. Dhavachelvan,et al.  A survey on data compression techniques: From the perspective of data quality, coding schemes, data type and applications , 2021, J. King Saud Univ. Comput. Inf. Sci..

[5]  Wu-chi Feng,et al.  Robust Data Compression for Irregular Wireless Sensor Networks Using Logical Mapping , 2013 .

[6]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[7]  Aaron D. Wyner,et al.  Improved redundancy of a version of the Lempel-Ziv algorithm , 1995, IEEE Trans. Inf. Theory.

[8]  Athanasios V. Vasilakos,et al.  Data Mining for the Internet of Things: Literature Review and Challenges , 2015, Int. J. Distributed Sens. Networks.

[9]  Guy Louchard,et al.  On the average redundancy rate of the Lempel-Ziv code , 1997, IEEE Trans. Inf. Theory.

[10]  Christos Faloutsos,et al.  Efficiently supporting ad hoc queries in large datasets of time sequences , 1997, SIGMOD '97.

[11]  Sharad Mehrotra,et al.  Capturing sensor-generated time series with quality guarantees , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[12]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[13]  Jie Liu,et al.  GAMPS: compressing multi sensor data by grouping and amplitude scaling , 2009, SIGMOD Conference.

[14]  Wei Wang,et al.  Chebyshev Similarity Match between Uncertain Time Series , 2015 .

[15]  Robert Gould,et al.  A Modern Approach to Regression with R , 2010 .

[16]  Walid G. Aref,et al.  Online Piece-wise Linear Approximation of Numerical Streams with Precision Guarantees , 2009, Proc. VLDB Endow..

[17]  C. Guestrin,et al.  Near-optimal sensor placements: maximizing information while minimizing communication cost , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[18]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[19]  Eamonn J. Keogh,et al.  Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases , 2001, Knowledge and Information Systems.

[20]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .