Enabling application‐aware flexible graph partition mechanism for parallel graph processing systems

With the emerging of the large‐scale graph data, Pregel‐like graph parallel processing systems have been an essential tool to efficiently process the graph data. The first step to use the Pregel‐like systems is to partition the graph into multiple blocks and distribute them on multiple machines. The partition strategy plays a significant role in determining the performance because a good partition could both ensure load balance and optimize network communication overhead, and vice versa. However, existing partition strategies fail to meet the requirements because they suffer from the following drawbacks: (1) they ignore the application features and (2) they ignore the multi‐application feature in productive environment. To overcome those drawbacks, we proposed the superblock partition strategy, which utilizes the atomic blocks generated by pre‐processing of the original graph and could be constructed and re‐constructed dynamically according to the submitted applications in real time. The hash‐based and clustering‐based pre‐partition methods are covered in details. The application feature extraction method and heuristic superblock partition algorithm are proposed to construct the superblocks. Experimental results show that the superblock partition strategy could boost the graph processing performance and its partition efficiency also outperforms the hash‐based and topology optimal partition strategy. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[2]  S H Strogatz,et al.  Random graph models of social networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  David S. Johnson,et al.  Some Simplified NP-Complete Graph Problems , 1976, Theor. Comput. Sci..

[4]  Jiawei Han,et al.  Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts , 2014, WSDM.

[5]  Bingsheng He,et al.  Network Performance Aware Graph Partitioning for Large Graph Processing Systems in the Cloud , 2014, Large Scale and Big Data.

[6]  Rishan Chen,et al.  Improving large graph processing on partitioned graphs in the cloud , 2012, SoCC '12.

[7]  Wilfred Ng,et al.  Blogel: A Block-Centric Framework for Distributed Computation on Real-World Graphs , 2014, Proc. VLDB Endow..

[8]  Fan Chung Graham,et al.  Local Partitioning for Directed Graphs Using PageRank , 2007, Internet Math..

[9]  Yang Liu,et al.  BPGM: A big graph mining tool , 2014 .

[10]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[11]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[12]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[13]  Bruce Hendrickson,et al.  The Chaco user`s guide. Version 1.0 , 1993 .

[14]  Ge Yu,et al.  A BSP-Based Parallel Iterative Processing System with Multiple Partition Strategies for Big Graphs , 2013, 2013 IEEE International Congress on Big Data.

[15]  Yogesh L. Simmhan,et al.  Partitioning Strategies for Load Balancing Subgraph-centric Distributed Graph Processing , 2015, ArXiv.

[16]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[17]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[18]  James T. Halbert,et al.  Scalable Graph Clustering with Pregel , 2013, CompleNet.

[19]  Gao Cong,et al.  A general graph-based model for recommendation in event-based social networks , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[20]  Michael Ley,et al.  DBLP - Some Lessons Learned , 2009, Proc. VLDB Endow..

[21]  Vasileios Pappas,et al.  Improving the Scalability of Data Center Networks with Traffic-aware Virtual Machine Placement , 2010, 2010 Proceedings IEEE INFOCOM.

[22]  Fabio Petroni,et al.  HDRF: Stream-Based Partitioning for Power-Law Graphs , 2015, CIKM.

[23]  José Fernando Rodrigues,et al.  Multimodal graph-based analysis over the DBLP repository: critical discoveries and hypotheses , 2015, SAC.

[24]  S. E. Schaeffer Survey Graph clustering , 2007 .

[25]  Seth Pettie,et al.  Single-Source Shortest Paths , 2019, Encyclopedia of Algorithms.

[26]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[27]  Vijay V. Vazirani,et al.  Finding k Cuts within Twice the Optimal , 1995, SIAM J. Comput..

[28]  Peter Boncz,et al.  First International Workshop on Graph Data Management Experiences and Systems , 2013, SIGMOD 2013.

[29]  David A. Bader,et al.  Benchmarking for Graph Clustering and Partitioning , 2014, Encyclopedia of Social Network Analysis and Mining.

[30]  Bo Zong,et al.  Towards effective partition management for large graphs , 2012, SIGMOD Conference.

[31]  Ning Xu,et al.  LogGP: A Log-based Dynamic Graph Partitioning Method , 2014, Proc. VLDB Endow..

[32]  Frans Stokman,et al.  Encyclopedia of Social Network Analysis and Mining , 2014 .

[33]  Nicholas Mays,et al.  Some lessons learned , 2006 .

[34]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[35]  Vijay V. Vazirani,et al.  Finding k-cuts within twice the optimal , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[36]  Nikolaos G. Bourbakis,et al.  Guest Editorial: Computational Solutions to Large-Scale Data Management and Analysis in Translational and Personalized Medicine , 2014, IEEE J. Biomed. Health Informatics.

[37]  Adrian Barbu,et al.  Graph partition by Swendsen-Wang cuts , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[38]  Aristides Gionis,et al.  The early-adopter graph and its application to web-page recommendation , 2012, CIKM '12.

[39]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[40]  Lan Chen,et al.  Semantic Link Network-Based Model for Organizing Multimedia Big Data , 2014, IEEE Transactions on Emerging Topics in Computing.

[41]  Panos Kalnis,et al.  Mizan: a system for dynamic load balancing in large-scale graph processing , 2013, EuroSys '13.

[42]  Laura M. Koehly,et al.  Multilevel models for social networks: Hierarchical Bayesian approaches to exponential random graph modeling , 2016, Soc. Networks.

[43]  Xue Chen,et al.  Building Association Link Network for Semantic Link on Web Resources , 2011, IEEE Transactions on Automation Science and Engineering.