Optimal Representation of Large-Scale Graph Data Based on K2-Tree

Graph is widely used to model data in various applications. With the rapid growth of many emerging applications such as Internet of Things, it is urgent to require the processing capability on large scale graphs with billions of vertices. Web graph is a typical case of graph data that is widely used for analyzing the structure, behavior and evolution of the World Wide Web. In this paper, we focus on optimal representation of large-scale Web graphs. Our work is motivated by the need of fit large-scale graphs into the main memory and carry out analyze on them. By analyzing the adjacency matrix of Web graphs, we find two characteristics on the distribution of 1s in the matrix. Firstly, only a very small proportion of elements in the matrix are 1s. Secondly, majority of 1s gather around the principal diagonal and form a few number of clusters in the matrix. Based on these characteristics, we first develop a clustering mechanism to locate the clusters of 1s in the adjacency matrix. Then, we combine this clustering mechanism with a structure named K2-tree and propose an approach for representing large-scale Web graphs compactly. Basic idea of the approach is trying to compress a large number of zeros as a single zero. Experimental results show that, our approach not only reduces the space for representing a Web graph, but also reduces the time consumption for operations such as retrieving neighbors of any nodes on the graph; compared with existing approaches, our approach achieves the best space/time tradeoff.

[1]  Houbing Song,et al.  Discovering time-dependent shortest path on traffic graph for drivers towards green driving , 2017, J. Netw. Comput. Appl..

[2]  Gonzalo Navarro,et al.  k2-Trees for Compact Web Graph Representation , 2009, SPIRE.

[3]  Enzo Baccarelli,et al.  Energy-Efficient Adaptive Resource Management for Real-Time Vehicular Cloud Services , 2019, IEEE Transactions on Cloud Computing.

[4]  Gonzalo Navarro,et al.  Fast and Compact Web Graph Representations , 2010, TWEB.

[5]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[6]  Ying Chen,et al.  On Connected Target k-Coverage in Heterogeneous Wireless Sensor Networks , 2016, Sensors.

[7]  Gonzalo Navarro,et al.  DACs: Bringing direct access to variable-length codes , 2013, Inf. Process. Manag..

[8]  Houbing Song,et al.  Mobile Cloud Computing Model and Big Data Analysis for Healthcare Applications , 2016, IEEE Access.

[9]  Xiaojun Zhang,et al.  A Secure ECC-based RFID Mutual Authentication Protocol to Enhance Patient Medication Safety , 2015, Journal of Medical Systems.

[10]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[11]  Junsheng Zhang,et al.  Semantic relation computing theory and its application , 2016, J. Netw. Comput. Appl..

[12]  Bhavani M. Thuraisingham,et al.  Heuristics-Based Query Processing for Large RDF Graphs Using Cloud Computing , 2011, IEEE Transactions on Knowledge and Data Engineering.

[13]  Enzo Baccarelli,et al.  Energy-saving self-configuring networked data centers , 2013, Comput. Networks.

[14]  Sabina Jeschke,et al.  Smart Cities: Foundations, Principles, and Applications , 2017 .

[15]  Zhihan Lv,et al.  Next-Generation Big Data Analytics: State of the Art, Challenges, and Future Research Topics , 2017, IEEE Transactions on Industrial Informatics.

[16]  Houbing Song,et al.  A Mobile Cloud Computing Model Using the Cloudlet Scheme for Big Data Applications , 2016, 2016 IEEE First International Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE).

[17]  Yi Wang,et al.  Differential Privacy Preserving in Big Data Analytics for Connected Health , 2016, Journal of Medical Systems.

[18]  Alberto Apostolico,et al.  Graph Compression by BFS , 2009, Algorithms.

[19]  Jiguo Yu,et al.  CWSC: Connected k-coverage working sets construction algorithm in wireless sensor networks , 2013 .

[20]  Sebastiano Vigna,et al.  The Graph Structure in the Web - Analyzed on Different Aggregation Levels , 2015, J. Web Sci..

[21]  Qing Liu,et al.  A differential privacy protection scheme for sensitive big data in body sensor networks , 2016, Ann. des Télécommunications.

[22]  Gonzalo Navarro,et al.  Compact representation of Web graphs with extended functionality , 2014, Inf. Syst..

[23]  Nicola Cordeschi,et al.  FUGE: A joint meta-heuristic approach to cloud job scheduling algorithm using fuzzy theory and a genetic method , 2014, Cluster Computing.