Lossless Compression of Data Tables in Mobile Devices by Using Co-clustering

Data tables have been widely used for storage of a collection of related records in a structured format in many mobile applications. The lossless compression of data tables not only brings benefits for storage, but also reduces network transmission latencies and energy costs in batteries. In this paper, we propose a novel lossless compression approach by combining co-clustering and information coding theory. It reorders table columns and rows simultaneously for shaping homogeneous blocks and further optimizes alignment within a block to expose redundancy, such that standard lossless encoders can significantly improve compression ratios. We tested the approach on a synthetic dataset and ten UCI real-life datasets by using a standard compressor 7Z. The extensive experimental results suggest that compared with the direct table compression without co-clustering and within-block alignment, our approach can boost compression rates at least 21% and up to 68%. The results also show that the compression time cost of the co-clustering approach is linearly proportional to a data table size. In addition, since the inverse transform of co-clustering is just exchange of rows and columns according to recorded indexes, the decompression procedure runs very fast and the decompression time cost is similar to the counterpart without using co-clustering. Thereby, our approach is suitable for lossless compression of data tables in mobile devices with constrained resources.

[1]  Stefano Lonardi,et al.  A compression-boosting transform for 2D data , 2005, Data Compression Conference.

[2]  Kah Phooi Seng,et al.  An Adaptive Lossless Data Compression Scheme for Wireless Sensor Networks , 2012, J. Sensors.

[3]  Li-Minn Ang,et al.  Fast and efficient lossless adaptive compression scheme for wireless sensor networks , 2015, Comput. Electr. Eng..

[4]  Khalid Sayood,et al.  Introduction to Data Compression , 1996 .

[5]  Raúl Zurita-Milla,et al.  Co-clustering geo-referenced time series: exploring spatio-temporal patterns in Dutch temperature data , 2015, Int. J. Geogr. Inf. Sci..

[6]  David Salomon,et al.  Data Compression: The Complete Reference , 2006 .

[7]  Smitha Rao,et al.  Evaluation of lossless compression techniques , 2015, 2015 International Conference on Communications and Signal Processing (ICCSP).

[8]  Sergio Verdú,et al.  Optimal Lossless Data Compression: Non-Asymptotics and Asymptotics , 2014, IEEE Transactions on Information Theory.

[9]  S. Mohan,et al.  An efficient block based lossless compression of medical images , 2016 .

[10]  Ruggero G. Pensa,et al.  Co‐clustering numerical data under user‐defined constraints , 2010, Stat. Anal. Data Min..

[11]  Ruggero G. Pensa,et al.  Constrained Co-clustering of Gene Expression Data , 2008, SDM.

[12]  Kenneth Ward Church,et al.  Engineering the compression of massive tables: an experimental approach , 2000, SODA '00.

[13]  Khalid Sayood,et al.  Introduction to data compression (2nd ed.) , 2000 .

[14]  Inderjit S. Dhillon,et al.  Information-theoretic co-clustering , 2003, KDD '03.

[15]  S. Bonacini,et al.  A lossless data compression system for a real-time application in HEP data acquisition , 2010, 2010 17th IEEE-NPSS Real Time Conference.

[16]  Bormin Huang,et al.  Lossless Compression of Hyperspectral Images Using Clustered Linear Prediction With Adaptive Prediction Length , 2012, IEEE Geoscience and Remote Sensing Letters.

[17]  Inderjit S. Dhillon,et al.  A generalized maximum entropy approach to bregman co-clustering and matrix approximation , 2004, J. Mach. Learn. Res..

[18]  Raffaele Giancarlo,et al.  Improving table compression with combinatorial optimization , 2002, SODA '02.

[19]  Arindam Banerjee,et al.  Bayesian Co-clustering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  David Salomon,et al.  Data compression - The Complete Reference, 4th Edition , 2004 .

[21]  Xiang Zhang,et al.  CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition , 2008, SIGMOD Conference.