Exploring Computation Locality of Graph Mining Algorithms on MapReduce

Previous implementations of graph mining algorithms on MapReduce ignore the characteristic of locality in distributed systems. For distributed systems, locality means the operations take place in local computing nodes without the communication with remote computing nodes. In this paper we present LI-MR (Local Iteration MapReduce) framework to improve a class of graph operators which can be described by repeated matrix-vector multiplications. LI-MR considers locality of sub graphs and adopts coarse granularity of communication unit for MapReduce. In particular, for sub graphs, only partial operations need synchronization. We propose a method to implement random data access on Hadoop by outputting the results to HBase. With the support of range query provided by HBase, LI-MR allows sub graphs to fulfil computation task with enough information in main memory. Because the locality feature of sub graphs, the info for the computation is limited. In this way, LI-MR framework combines in-memory computation with MapReduce model for graph algorithms.

[1]  Réka Albert,et al.  Near linear time algorithm to detect community structures in large-scale networks. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  Christos Faloutsos,et al.  Pegasus: Mining billion-scale graphs in the cloud , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Christos Faloutsos,et al.  Automatic multimedia cross-modal correlation discovery , 2004, KDD.

[4]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[5]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[6]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7]  Dong Xin,et al.  Fast personalized PageRank on MapReduce , 2011, SIGMOD '11.

[8]  Koby Crammer,et al.  New Regularized Algorithms for Transductive Learning , 2009, ECML/PKDD.

[9]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[10]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[11]  Ana Paula Appel,et al.  HADI: Mining Radii of Large Graphs , 2011, TKDD.

[12]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.