An efficient iterative graph data processing framework based on bulk synchronous parallel model

Graph data processing has been widely applied in a variety of domains such as industry, science, social network, and so on. It therefore has stimulated many efforts devoted to this area. To embrace the fast development trend of big graph data, graph data processing based on Pregel‐like systems has been regarded as one of the most promising ways and has widely attracted the attention of researchers. However, it still remains in its early stage and there still exist many challenges. In Pregel, the superstep synchronization is time consuming as the graph data iteration operation requires multiple synchronizations. Furthermore, the graph data partition strategy adopted by Pregel fails to support load balancing, therefore causing the increase of network I/O overhead as the scale of graph data grows. To address these issues, this paper presents an efficient computational framework for graph data processing based on the bulk synchronous parallel model. The global synchronization control mechanism is improved by determining the start time of the next round of superstep through counting the number of global message files. Furthermore, an improved graph data partition mechanism based on a balanced hash method is proposed to reduce the communication overhead between different partitions of sub‐graph computational tasks. We also re‐design the PageRank algorithm to verify the effectiveness of the proposed framework. Experimental results on different real‐world datasets verify the efficiency of our proposed framework as it outperforms Giraph (an open source Pregel‐like system) by 58%−69%, and achieves 10×−17× performance improvement over Hadoop.

[1]  Lixin Gao,et al.  Maiter: An Asynchronous Graph Processing Framework for Delta-based Accumulative Iterative Computation , 2017, 1710.05785.

[2]  Wei Zhou,et al.  An efficient graph data processing system for large‐scale social network service applications , 2016, Concurr. Comput. Pract. Exp..

[3]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[4]  송태민 Social Big Data 기반 보건의료 연구방법론 , 2013 .

[5]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[6]  Lei Shu,et al.  Mobile big data fault-tolerant processing for ehealth networks , 2016, IEEE Network.

[7]  Sameh Elnikety,et al.  Systems for Big-Graphs , 2014, Proc. VLDB Endow..

[8]  Tamer Elsayed,et al.  iHadoop: Asynchronous Iterations for MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.

[9]  Beng Chin Ooi,et al.  Distributed data management using MapReduce , 2014, CSUR.

[10]  Claudio Martella,et al.  Spinner: Scalable Graph Partitioning in the Cloud , 2014, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[11]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[12]  M. Tamer Özsu,et al.  An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..

[13]  Zuoning Chen,et al.  A Balanced Vertex Cut Partition Method in Distributed Graph Computing , 2015, IScIDE.

[14]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, Journal of Grid Computing.

[15]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[16]  Peter Sanders,et al.  Recent Advances in Graph Partitioning , 2013, Algorithm Engineering.

[17]  Sherif Sakr,et al.  Large-Scale Graph Processing Systems , 2020, Big Data 2.0 Processing Systems.

[18]  ZhouWei,et al.  An efficient graph data processing system for large-scale social network service applications , 2016 .

[19]  Rüdiger Kapitza,et al.  Running ZooKeeper Coordination Services in Untrusted Clouds , 2014, HotDep.

[20]  Tinghuai Ma,et al.  KDVEM : a k-degree anonymity with vertex and edge modification algorithm , 2015, Computing.

[21]  Xuesong Yan,et al.  MyBSP: An Iterative Processing Framework Based on the Cloud Platform for Graph Data , 2014, 2014 Second International Conference on Advanced Cloud and Big Data.

[22]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[23]  Luke M. Leslie,et al.  An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing , 2017, Proc. VLDB Endow..

[24]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[25]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[26]  Deze Zeng,et al.  MR-COF: A Genetic MapReduce Configuration Optimization Framework , 2015, ICA3PP.

[27]  Jian Shen,et al.  $$\varvec{\textit{KDVEM}}$$KDVEM: a $$k$$k-degree anonymity with vertex and edge modification algorithm , 2015, Computing.

[28]  Dong Yue,et al.  Toward Distributed Data Processing on Intelligent Leak-Points Prediction in Petrochemical Industries , 2016, IEEE Transactions on Industrial Informatics.

[29]  Song Guo,et al.  Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time , 2017, IEEE Transactions on Parallel and Distributed Systems.

[30]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[31]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[32]  Khuzaima Daudjee,et al.  Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems , 2015, Proc. VLDB Endow..

[33]  K. Selçuk Candan,et al.  SBV-Cut: Vertex-cut based graph partitioning using structural balance vertices , 2012, Data Knowl. Eng..

[34]  Young-Sik Jeong,et al.  Investigating Apache Hama: a bulk synchronous parallel computing framework , 2017, The Journal of Supercomputing.

[35]  Jianlong Zhong,et al.  Towards GPU-Accelerated Large-Scale Graph Processing in the Cloud , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[36]  Robert Krauthgamer,et al.  Min-max Graph Partitioning and Small Set Expansion , 2011, 2011 IEEE 52nd Annual Symposium on Foundations of Computer Science.

[37]  Hamida Seba,et al.  Querying massive graph data: A compress and search approach , 2017, Future Gener. Comput. Syst..

[38]  Zhangjie Fu,et al.  Privacy-Preserving Smart Semantic Search Based on Conceptual Graphs Over Encrypted Outsourced Data , 2017, IEEE Transactions on Information Forensics and Security.

[39]  Tinghuai Ma,et al.  A novel subgraph K+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{+}$$\end{document}-isomorphism method in social , 2017, Soft Computing.

[40]  Yvan Saeys,et al.  Mining the Enriched Subgraphs for Specific Vertices in a Biological Graph , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[41]  Steven C. H. Hoi,et al.  Online portfolio selection: A survey , 2012, CSUR.

[42]  Toyotaro Suzumura,et al.  High-Performance Graph Data Management and Mining in Cloud Environments with X10 , 2017, Cloud Computing, 2nd Ed..

[43]  Fan Chung,et al.  A Brief Survey of PageRank Algorithms , 2014, IEEE Transactions on Network Science and Engineering.

[44]  Hassan Naderi,et al.  ExPregel: a new computational model for large‐scale graph processing , 2015, Concurr. Comput. Pract. Exp..

[45]  Yao Wang,et al.  LED: A fast overlapping communities detection algorithm based on structural clustering , 2016, Neurocomputing.

[46]  Fang Dong,et al.  Enabling application‐aware flexible graph partition mechanism for parallel graph processing systems , 2017, Concurr. Comput. Pract. Exp..

[47]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[48]  Youngseok Lee,et al.  A Scalable and Highly Available Network Management Architecture on Consistent Hashing , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[49]  Jason J. Jung,et al.  Social big data: Recent achievements and new challenges , 2015, Information Fusion.

[50]  Yue Yu,et al.  A query-matching mechanism over out-of-order event stream in IOT , 2013, Int. J. Ad Hoc Ubiquitous Comput..

[51]  Vipin Kumar,et al.  Trends in big data analytics , 2014, J. Parallel Distributed Comput..

[52]  Sherif Sakr Processing large-scale graph data: A guide to current technology , 2013 .

[53]  Jin Wang,et al.  Privacy-Preserving Smart Similarity Search Based on Simhash over Encrypted Data in Cloud Computing , 2015 .