Scaling Distributed Machine Learning with the Parameter Server

Big data may contain big values, but also brings lots of challenges to the computing theory, architecture, framework, knowledge discovery algorithms, and domain specific tools and applications. Beyond the 4-V or 5-V characters of big datasets, the data processing shows the features like inexact, incremental, and inductive manner. This brings new research opportunities to research community across theory, systems, algorithms, and applications. Is there some new "theory" for the big data? How to handle the data computing algorithms in an operatable manner? This report shares some view on new challenges identified, and covers some of the application scenarios such as micro-blog data analysis and data processing in building next generation search engines.

[1]  Jack Dongarra,et al.  Corrigenda: “An Extended Set of FORTRAN Basic Linear Algebra Subprograms” , 1988, TOMS.

[2]  Jack J. Dongarra,et al.  An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[3]  F. Girosi,et al.  From regularization to radial, tensor and additive splines , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).

[4]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[5]  James Demmel,et al.  LAPACK Users' Guide, Third Edition , 1999, Software, Environments and Tools.

[6]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[7]  Leslie Lamport,et al.  Paxos Made Simple , 2001 .

[8]  Yuefan Deng,et al.  New trends in high performance computing , 2001, Parallel Computing.

[9]  Jack J. Dongarra,et al.  Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..

[10]  Antony I. T. Rowstron,et al.  Pastry: Scalable, Decentralized Object Location, and Routing for Large-Scale Peer-to-Peer Systems , 2001, Middleware.

[11]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[12]  Jeffrey Considine,et al.  Simple Load Balancing for Distributed Hash Tables , 2003, IPTPS.

[13]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Robbert van Renesse,et al.  Chain Replication for Supporting High Throughput and Availability , 2004, OSDI.

[16]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[17]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Graham Cormode,et al.  Summarizing and Mining Skewed Data Streams , 2005, SDM.

[19]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[20]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[21]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[22]  Alexander J. Smola,et al.  A scalable modular convex solver for regularized risk minimization , 2007, KDD '07.

[23]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[24]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[25]  Jinyang Li,et al.  Piccolo: Building Fast, Distributed Programs with Partitioned Tables , 2010, OSDI.

[26]  Piotr Indyk,et al.  Space-optimal heavy hitters with strong error bounds , 2010, TODS.

[27]  Alexander J. Smola,et al.  Scalable distributed inference of dynamic user interests for behavioral targeting , 2011, KDD.

[28]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[29]  Scott Shenker,et al.  Fast and Interactive Analytics over Hadoop Data with Spark , 2012, login Usenix Mag..

[30]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[31]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[32]  Amar Phanishayee,et al.  Flex-KV: enabling high-performance and flexible KV systems , 2012 .

[33]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[34]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[35]  Tim Kraska,et al.  MLbase: A Distributed Machine-learning System , 2013, CIDR.

[36]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[37]  Aaron Q. Li,et al.  Parameter Server for Distributed Machine Learning , 2013 .

[38]  Seunghak Lee,et al.  Petuum: A Framework for Iterative-Convergent Distributed ML , 2013, ArXiv.

[39]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[40]  M. Abadi,et al.  Naiad: a timely dataflow system , 2013, SOSP.

[41]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[42]  Tim Kraska,et al.  MLI: An API for Distributed Machine Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[43]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[44]  W. Karush Minima of Functions of Several Variables with Inequalities as Side Conditions , 2014 .

[45]  Carlo Curino,et al.  REEF: Retainable Evaluator Execution Framework , 2013, Proc. VLDB Endow..