Maiter : A Message-Passing Distributed Framework for Accumulative Iterative Computation

Myriad of machine learning and data mining algorithms require parsing data sets iteratively. These iterative algorithms have to be implemented in a distributed environment in order to scale to massive data sets. To accelerate iterative computations in a large-scale distributed environment, we identify a broad class of iterative computations that can accumulate iterative update results. Specifically, different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, accumulative iterative update accumulates the intermediate iterative update results. We prove that an accumulative update will yield the same result as its corresponding traditional iterative update. Furthermore, accumulative iterative computation can be performed asynchronously and converges much faster than traditional iterative computation. We present a general computation model to describe asynchronous accumulative iterative computation. Based on the computation model, we design and implement a messagepassing distributed framework, Maiter. We evaluate Maiter on Amazon EC2 Cloud with 100 EC2 instances. Our results show that Maiter achieves as much as 60x speedup over Hadoop for implementing iterative algorithms.

[1]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[2]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[3]  Dimitri P. Bertsekas,et al.  Distributed asynchronous computation of fixed points , 1983, Math. Program..

[4]  D. Szyld,et al.  On asynchronous iterations , 2000 .

[5]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[6]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[8]  Didier El Baz,et al.  A new class of asynchronous iterative algorithms with order intervals , 1998, Math. Comput..

[9]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[10]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[11]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[12]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[13]  Suresh Jagannathan,et al.  Asynchronous Algorithms in MapReduce , 2010, 2010 IEEE International Conference on Cluster Computing.

[14]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, IPDPS Workshops.

[15]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[16]  Yanfeng Zhang,et al.  PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.

[17]  James C. Browne,et al.  Distributed pagerank for P2P systems , 2003, High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on.

[18]  Elad Yom-Tov,et al.  Parallel Pairwise Clustering , 2009, SDM.

[19]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[20]  Yuan Yu,et al.  Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.

[21]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[22]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[23]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[24]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[25]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.