Asynchronous Computation Model for Large-Scale Iterative Computations

Iterative algorithms are widely existed in machine learning and data mining applications. These algorithms have to be implemented in a large-scale distributed environment in order to scale to massive data sets. While synchronous iterations might result in unexpected poor performance due to some particular stragglers in a heterogeneous distributed environment, especially in a cloud environment. To bypass the synchronization barriers in iterative computations, this chapter introduces an asynchronous iteration model, delta-based accumulative iterative computation (DAIC). Different from traditional iterative computations, which iteratively update the result based on the result from the previous iteration, DAIC asynchronously updates the result by accumulating the “changes” between iterations. This chapter presents a general asynchronous computation model to describe DAIC and introduces a distributed framework for asynchronous iteration, Maiter. The experimental results show that Maiter outperforms many other state-of-the-art frameworks.

[1]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[2]  Frank McSherry,et al.  A uniform approach to accelerated PageRank computation , 2005, WWW '05.

[3]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[4]  Steven Hand,et al.  CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.

[5]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[6]  Elad Yom-Tov,et al.  Parallel Pairwise Clustering , 2009, SDM.

[7]  Suresh Jagannathan,et al.  Asynchronous Algorithms in MapReduce , 2010, 2010 IEEE International Conference on Cluster Computing.

[8]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[9]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.

[10]  Gérard M. Baudet,et al.  Asynchronous Iterative Methods for Multiprocessors , 1978, JACM.

[11]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[12]  Geoffrey C. Fox,et al.  Twister: a runtime for iterative MapReduce , 2010, HPDC '10.

[13]  Dimitri P. Bertsekas,et al.  Distributed asynchronous computation of fixed points , 1983, Math. Program..

[14]  Yanfeng Zhang,et al.  PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.

[15]  Yin Zhang,et al.  Scalable proximity estimation and link prediction in online social networks , 2009, IMC '09.

[16]  Didier El Baz,et al.  A new class of asynchronous iterative algorithms with order intervals , 1998, Math. Comput..

[17]  Jon Kleinberg,et al.  Authoritative sources in a hyperlinked environment , 1999, SODA '98.

[18]  Christos Faloutsos,et al.  PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[19]  Michael D. Ernst,et al.  HaLoop , 2010, Proc. VLDB Endow..

[20]  Johannes Gehrke,et al.  Asynchronous Large-Scale Graph Processing Made Easy , 2013, CIDR.

[21]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[22]  Leslie Lamport,et al.  Distributed snapshots: determining global states of distributed systems , 1985, TOCS.

[23]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[24]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[25]  D. Szyld,et al.  On asynchronous iterations , 2000 .

[26]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[27]  Yanfeng Zhang,et al.  iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, Journal of Grid Computing.