BiFennel: Fast Bipartite Graph Partitioning Algorithm for Big Data

Graph computing is widely utilized today, which severely requires the ability of processing graphs of billion vertices rapidly for social network analyzing, bio-informational network analyzing and semantic processing. Therefore, graph processing play a significant role in the research and application development. Data of music and movie recommendation and LDA topics can be modeled as bipartite graph and perform the computation with graph processing engines. The most important step before graph computation is graph partitioning. Graph partitioning is a mature technology, however, most of classic graph partitioning algorithms require iterative calculation for several times, which causes high time complexity. Some algorithms with short partitioning time proposed these years, but they cannot be used in bipartite graph directly. This paper proposes a new bipartite graph partitioning algorithm, BiFennel, which effectively decreases graph processing time and network loading by reducing vertex replication factor and maintaining work balance. We implement BiFennel in a popular graph engine called PowerGraph. The performance results show that BiFennel has 29~55% improvement on communication cost and 21~49% improvement on overall runtime comparing with Aweto.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Binyu Zang,et al.  Bipartite-oriented distributed graph partitioning for big learning , 2014, APSys.

[3]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[6]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[7]  S.,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2022 .

[8]  Ge Yu,et al.  Large Scale Graph Data Processing on Cloud Computing Environments: Large Scale Graph Data Processing on Cloud Computing Environments , 2011 .

[9]  Sergey Brin,et al.  Reprint of: The anatomy of a large-scale hypertextual web search engine , 2012, Comput. Networks.

[10]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[11]  Konstantin Andreev,et al.  Balanced Graph Partitioning , 2004, SPAA '04.

[12]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[13]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[14]  Brian N. Bershad,et al.  PRESTO: A system for object‐oriented parallel programming , 1988, Softw. Pract. Exp..