GMR: graph-compatible MapReduce programming model

The MapReduce programming model is widely used to parallelize data processing over the large scale of commodity computer clusters. However, on account of its monotonous data representation, it fails to express graph-parallel algorithms naturally and execute them efficiently. Alternatively, Pregel and PowerGraph could address these challenges. But they require users to familiarize another set of programming patterns and platforms, and at the same time the legacy MapReduce code also becomes incompatible and useless. In this paper, we proposed the Graph-compatible MapReduce (GMR) as an extension of Google’s Standard MapReduce (SMR). In this way, graph-parallel algorithm will be naturally expressed without compromising the efficiency and simplicity, and meanwhile the conventional MapReduce programming pattern be preserved. Also, users could gain the convenience of “Think like a vertex”. Based on the experimental studying, we analyzed the ratio of the redundant computation, transmission and data caching introduced in naive iterative MapReduce platforms (e.g., HaLoop, Twister). Furthermore, we discussed the difference between GMR and the graph-targeted frameworks. The evaluation experiment results show that GMR outperforms GraphX in a series of real-world graph-parallel algorithms.

[1]  Weizhi Nie,et al.  3D object retrieval based on sparse coding in weak supervision , 2016, J. Vis. Commun. Image Represent..

[2]  Tat-Seng Chua,et al.  Learning from Collective Intelligence , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[4]  Weizhi Nie,et al.  Clique-graph matching by preserving global & local structure , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yue Gao,et al.  Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[6]  Huanbo Luan,et al.  Discrete Collaborative Filtering , 2016, SIGIR.

[7]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[8]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[9]  Sanjay Ghemawat,et al.  MapReduce: simplified data processing on large clusters , 2008, CACM.

[10]  Reynold Xin,et al.  GraphX: a resilient distributed graph system on Spark , 2013, GRADES.

[11]  Zan Gao,et al.  Multi-view discriminative and structured dictionary learning with group sparsity for human action recognition , 2015, Signal Process..

[12]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[13]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[14]  Gary L. Miller,et al.  On the performance of spectral graph partitioning methods , 1995, SODA '95.

[15]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[16]  Kirk P. Arnett,et al.  The size of the IT job market , 2008, CACM.

[17]  Muthu Dayalan,et al.  MapReduce : Simplified Data Processing on Large Cluster , 2018 .

[18]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[19]  Yue Gao,et al.  Multi-Modal Clique-Graph Matching for View-Based 3D Model Retrieval , 2016, IEEE Transactions on Image Processing.

[20]  Wenhui Li,et al.  Cross-view action recognition by cross-domain learning , 2016, Image Vis. Comput..

[21]  Martin Weilenmann,et al.  Aspects of highly transient catalyst simulation , 2012 .

[22]  Wei-Ta Chu,et al.  Predicting Occupation from Images by Combining Face and Body Context Information , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[23]  John E. Savage,et al.  Parallelism in Graph-Partitioning , 1991, J. Parallel Distributed Comput..

[24]  Jure Leskovec,et al.  Defining and Evaluating Network Communities Based on Ground-Truth , 2012, ICDM.

[25]  H. Zhang,et al.  Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition , 2015, Neurocomputing.

[26]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[27]  John R. Gilbert,et al.  Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks , 2009, SPAA '09.

[28]  Andrew V. Goldberg,et al.  Shortest paths algorithms: Theory and experimental evaluation , 1994, SODA '94.

[29]  Mohan S. Kankanhalli,et al.  Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, Journal of Grid Computing.

[31]  Yang Yang,et al.  Robust (Semi) Nonnegative Graph Embedding , 2014, IEEE Transactions on Image Processing.

[32]  NieWei-Zhi,et al.  Cross-view action recognition by cross-domain learning , 2016 .