High Performance Graph Processing with Locality Oriented Design

Many distributed graph processing frameworks have emerged for helping doing large scale data analysis for many applications including social network and data mining. The existing frameworks usually focus on the system scalability without consideration of local computing performance. We have observed two locality issues which greatly influence the local computing performance in existing systems. One is the locality of the data associated with each vertex/edge. The data are often considered as a logical undividable unit and put into continuous memory. However, it is quite common that for some computing steps, only some portions of data (called as some properties) are needed. The current data layout incurs large amount of interleaved memory access. The other issue is their execution engine applies computation at a granularity of vertex. Making optimization for the locality of source vertex of each edge will often hurt the locality of target vertex or vice versa. We have built a distributed graph processing framework called Photon to address the above issues. Photon employs Property View to store the same type of property for all vertices and edges together. This will improve the locality while doing computation with a portion of properties. Photon also employs an edge-centric execution engine with Hilbert-Order that improve the locality during computation. We have evaluated Photon with five graph applications using five real-world graphs and compared it with four existing systems. The results show that Property View and edge-centric execution design improve graph processing by 2.4X.

[1]  Joaquin Quiñonero Candela,et al.  Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine , 2010, ICML.

[2]  Sebastiano Vigna,et al.  A large time-aware web graph , 2008, SIGF.

[3]  D. Hilbert Ueber die stetige Abbildung einer Line auf ein Flächenstück , 1891 .

[4]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[5]  Herman J. Haverkort,et al.  Locality and bounding-box quality of two-dimensional space-filling curves , 2010, Comput. Geom..

[6]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[7]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[8]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[9]  Dennis M. Wilkinson,et al.  Large-Scale Parallel Collaborative Filtering for the Netflix Prize , 2008, AAIM.

[10]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[11]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[13]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[14]  Jennifer Widom,et al.  GPS: a graph processing system , 2013, SSDBM.

[15]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[16]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.