Efficient Data Compression for IoT Devices using Huffman Coding Based Techniques

In the era of Big Data and Internet of Things, devices ubiquitously sense and gather data at a rapid pace. Various methods have been proposed to speed up the analysis of the data and also mining it for information. However, considering the resource constraints in majority of Internet of Things, this is challenging. The analysis of data can be handled using a massive array of compute nodes; but the data from the devices need to be transferred to the servers to start the process. This involves transmitting the data over the network. Of course, with the huge quantity of data, this requires significant energy and resources to do so. Therefore, in order to address such issues with data analytics using Internet of Things, using data compression techniques is a viable option. Since graphs represent most real world data, including data gathered by Internet of Things, methods to compress graphs have been in the forefront of such endeavors. In this paper we propose techniques to compress graphs by finding specific patterns and replacing those with identifiers that are of variable length, an idea inspired by Huffman Coding. Specifically, given a graph G = (V, E), where V is the set of vertices and E is the set of edges, and |V| = n, we propose methods to reduce the space requirements of the graph by compressing the adjacency matrix representation of the same. The proposed methods show up to 80% reduction is the space required to store the graphs as compared to using the adjacency matrix. The methods can also be applied to other representations as well. The proposed techniques help solve the issues related to transfer of data over the network in case of resource limited Internet of Things, and addresses the challenges of data analytics in this domain.

[1]  Raymie Stata,et al.  The Link Database: fast access to graphs of the Web , 2002, Proceedings DCC 2002. Data Compression Conference.

[2]  David A. Bader,et al.  GTgraph : A Synthetic Graph Generator Suite , 2006 .

[3]  John K. Antonio,et al.  Counting Problems on Graphs: GPU Storage and Parallel Computing Techniques , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[4]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[5]  Linh Ngo,et al.  Synthetic data generation for the internet of things , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[6]  Sridhar Radhakrishnan,et al.  Connecting the dots: Triangle completion and related problems on large data sets using GPUs , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[7]  Amlan Chatterjee Correlation based Empirical Model for Estimating CPU Availability for Multi-Core Processor in a Computer Grid , 2018 .

[8]  Soumya Sen,et al.  Distributed location detection algorithms using IoT for commercial aviation , 2017, 2017 Third International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[9]  John K. Antonio,et al.  Data Structures and Algorithms for Counting Problems on Graphs using GPU , 2013, Int. J. Netw. Comput..

[10]  A. Chatterjee,et al.  Classification of wearable computing: A survey of electronic assistive technology and future design , 2016, 2016 Second International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[11]  A. Chatterjee,et al.  Energy efficient framework for health monitoring using mobile systems , 2017, 2017 2nd International Conference for Convergence in Technology (I2CT).

[12]  John K. Antonio,et al.  Performance Prediction Model and Analysis for Compute-Intensive Tasks on GPUs , 2014, NPC.

[13]  Silvio Lattanzi,et al.  On compressing social networks , 2009, KDD.

[14]  Michael Nelson,et al.  Queryable compression on streaming social networks , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[15]  Athanasios V. Vasilakos,et al.  IoT-Based Big Data Storage Systems in Cloud Computing: Perspectives and Challenges , 2017, IEEE Internet of Things Journal.

[16]  Young-June Choi,et al.  Data compression and prediction using machine learning for industrial IoT , 2018, 2018 International Conference on Information Networking (ICOIN).

[17]  John K. Antonio,et al.  On Analyzing Large Graphs Using GPUs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[18]  A. Chatterjee,et al.  Exploiting topological structures for graph compression based on quadtrees , 2016, 2016 Second International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN).

[19]  Hanan Samet,et al.  Using Quadtrees to Represent Spatial Data , 1985 .

[20]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[21]  Bin Tang,et al.  Efficient clear air turbulence avoidance algorithms using IoT for commercial aviation , 2017, 2017 Recent Developments in Control, Automation & Power Engineering (RDCAPE).

[22]  Michael Nelson,et al.  On compressing massive streaming graphs with Quadtrees , 2015, 2015 IEEE International Conference on Big Data (Big Data).

[23]  Xianbin Wang,et al.  Live Data Analytics With Collaborative Edge and Cloud Processing in Wireless IoT Networks , 2017, IEEE Access.

[24]  Amlan Chatterjee Parallel Algorithms for Counting Problems on Graphs Using Graphics Processing Units , 2015 .

[25]  Vijay A. Kanade “Organic optical data storage” for securely safeguarding IoT secrets , 2017, 2017 International Conference on Big Data, IoT and Data Science (BID).

[26]  Amlan Chatterjee,et al.  Job scheduling in cloud datacenters using enhanced particle swarm optimization , 2017, 2017 2nd International Conference for Convergence in Technology (I2CT).