Speculative Approximations for Terascale Distributed Gradient Descent Optimization

Model calibration is a major challenge faced by the plethora of statistical analytics packages that are increasingly used in Big Data applications. Identifying the optimal model parameters is a time-consuming process that has to be executed from scratch for every dataset/model combination even by experienced data scientists. We argue that the incapacity to evaluate multiple parameter configurations simultaneously and the lack of support to quickly identify sub-optimal configurations are the principal causes. In this paper, we develop two database-inspired techniques for efficient model calibration. Speculative parameter testing applies advanced parallel multi-query processing methods to evaluate several configurations concurrently. Online aggregation is applied to identify sub-optimal configurations early in the processing by incrementally sampling the training dataset and estimating the objective function corresponding to each configuration. We design concurrent online aggregation estimators and define halting conditions to accurately and timely stop the execution. We apply the proposed techniques to distributed gradient descent optimization -- batch and incremental -- for support vector machines and logistic regression models. We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big Data analytics system -- and evaluate their performance over terascalesize synthetic and real datasets. The results confirm that as many as 32 configurations can be evaluated concurrently almost as fast as one, while sub-optimal configurations are detected accurately in as little as a 1/20th fraction of the time.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  Kun Li,et al.  The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..

[3]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[4]  Tim Kraska,et al.  MLI: An API for Distributed Machine Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[5]  Jerry Nedelman,et al.  Book review: “Bayesian Data Analysis,” Second Edition by A. Gelman, J.B. Carlin, H.S. Stern, and D.B. Rubin Chapman & Hall/CRC, 2004 , 2005, Comput. Stat..

[6]  Guy E. Blelloch,et al.  GraphChi: Large-Scale Graph Computation on Just a PC , 2012, OSDI.

[7]  Fei Xu,et al.  The DBO database system , 2008, SIGMOD Conference.

[8]  Christopher Ré,et al.  DimmWitted: A Study of Main-Memory Statistical Analytics , 2014, Proc. VLDB Endow..

[9]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[10]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[11]  Ion Stoica,et al.  Blink and It's Done: Interactive Queries on Very Large Data , 2012, Proc. VLDB Endow..

[12]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[13]  Fei Xu,et al.  Turbo-Charging Estimate Convergence in DBO , 2009, Proc. VLDB Endow..

[14]  Chris Jermaine,et al.  Scalable approximate query processing with the DBO engine , 2007, SIGMOD '07.

[15]  Peter J. Haas,et al.  Ripple joins for online aggregation , 1999, SIGMOD '99.

[16]  Florin Rusu,et al.  Scalable I/O-bound parallel incremental gradient descent for big data analytics in GLADE , 2013, DanaC '13.

[17]  Suman Nath,et al.  PR-join: a non-blocking join achieving higher early result rate with statistical guarantees , 2010, SIGMOD Conference.

[18]  Luis Leopoldo Perez,et al.  A comparison of platforms for implementing and running very large scale machine learning algorithms , 2014, SIGMOD Conference.

[19]  Jian Pei,et al.  On Pruning for Top-K Ranking in Uncertain Databases , 2011, Proc. VLDB Endow..

[20]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[21]  Florin Rusu,et al.  Speculative Approximations for Terascale Analytics , 2014, ArXiv.

[22]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[23]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[24]  Christopher Ré,et al.  Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.

[25]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[26]  Jeffrey F. Naughton,et al.  A scalable hash ripple join algorithm , 2002, SIGMOD '02.

[27]  Chris Jermaine,et al.  Online aggregation for large MapReduce jobs , 2011, Proc. VLDB Endow..

[28]  Yu Cheng,et al.  GLADE: big data analytics made easy , 2012, SIGMOD Conference.

[29]  Peter J. Haas,et al.  Simulation of database-valued markov chains using SimSQL , 2013, SIGMOD '13.

[30]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[31]  Joseph E. Gonzalez,et al.  GraphLab: A New Parallel Framework for Machine Learning , 2010 .

[32]  John Langford,et al.  A reliable effective terascale linear learning system , 2011, J. Mach. Learn. Res..

[33]  Florin Rusu,et al.  GLADE: a scalable framework for efficient analytics , 2012, OPSR.

[34]  Peter J. Haas,et al.  Large-sample and deterministic confidence intervals for online aggregation , 1997, Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150).

[35]  Ohad Shamir,et al.  Optimal Distributed Online Prediction , 2011, ICML.

[36]  Joseph M. Hellerstein,et al.  CONTROL: continuous output and navigation technology with refinement on-line , 1998, SIGMOD '98.

[37]  Beng Chin Ooi,et al.  Distributed Online Aggregation , 2009, Proc. VLDB Endow..

[38]  Kunle Olukotun,et al.  OptiML: An Implicitly Parallel Domain-Specific Language for Machine Learning , 2011, ICML.

[39]  Ameet Talwalkar,et al.  Knowing when you're wrong: building fast and reliable approximate query processing systems , 2014, SIGMOD Conference.

[40]  Shirish Tatikonda,et al.  SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[41]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[42]  Alexander J. Smola,et al.  Parallelized Stochastic Gradient Descent , 2010, NIPS.

[43]  Chris Jermaine,et al.  The Sort-Merge-Shrink join , 2006, TODS.

[44]  Florin Rusu,et al.  PF-OLA: a high-performance framework for parallel online aggregation , 2012, Distributed and Parallel Databases.

[45]  Joseph M. Hellerstein,et al.  MapReduce Online , 2010, NSDI.

[46]  Beng Chin Ooi,et al.  Continuous sampling for online aggregation over multiple queries , 2010, SIGMOD Conference.