Data Driven Resource Allocation for Distributed Learning

In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be "locally simple but globally complex" (Vapnik & Bottou 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.

[1]  Sudipto Guha,et al.  A constant-factor approximation algorithm for the k-median problem (extended abstract) , 1999, STOC '99.

[2]  David P. Woodruff,et al.  Improved Distributed Principal Component Analysis , 2014, NIPS.

[3]  Sariel Har-Peled,et al.  Fast Clustering with Lower Bounds: No Customer too Far, No Shop too Small , 2013, ArXiv.

[4]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[5]  Evangelos Markakis,et al.  Greedy facility location algorithms analyzed using dual fitting with factor-revealing LP , 2002, JACM.

[6]  Chaitanya Swamy,et al.  Approximation Algorithms for Clustering Problems with Lower Bounds and Outliers , 2016, ICALP.

[7]  Kenneth Steiglitz,et al.  Combinatorial Optimization: Algorithms and Complexity , 1981 .

[8]  Jaroslaw Byrka,et al.  Bi-Factor Approximation Algorithms for Hard Capacitated k-Median Problems , 2013, SODA.

[9]  Léon Bottou,et al.  Local Algorithms for Pattern Recognition and Dependencies Estimation , 1993, Neural Computation.

[10]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[11]  Sudipto Guha,et al.  Hierarchical placement and network design problems , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  Andrew V. Goldberg,et al.  Graph Partitioning with Natural Cuts , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[13]  Rishabh K. Iyer,et al.  Mixed Robust/Average Submodular Partitioning: Fast Algorithms, Guarantees, and Applications , 2015, NIPS.

[14]  Samir Khuller,et al.  The Capacitated K-Center Problem , 2000, SIAM J. Discret. Math..

[15]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[16]  Mohammad Mahdian,et al.  Universal Facility Location , 2003, ESA.

[17]  Shanfei Li,et al.  An Improved Approximation Algorithm for the Hard Uniform Capacitated k-median Problem , 2014, APPROX-RANDOM.

[18]  Samir Khuller,et al.  LP Rounding for k-Centers with Non-uniform Hard Capacities , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[19]  Manu Agarwal,et al.  k-Means++ under approximation stability , 2015, Theor. Comput. Sci..

[20]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[21]  David R. Karger,et al.  Building Steiner trees with incomplete global knowledge , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[22]  Maria-Florina Balcan,et al.  Distributed Learning, Communication Complexity and Privacy , 2012, COLT.

[23]  Shai Ben-David,et al.  PLAL: Cluster-based active learning , 2013, COLT.

[24]  Sanjoy Dasgupta,et al.  Randomized partition trees for exact nearest neighbor search , 2013, COLT.

[25]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[26]  Tim Roughgarden,et al.  Decompositions of triangle-dense graphs , 2013, SIAM J. Comput..

[27]  Mark Braverman,et al.  Approximate Nash Equilibria under Stability Conditions , 2010, ArXiv.

[28]  Karen Aardal,et al.  Approximation algorithms for hard capacitated k-facility location problems , 2013, Eur. J. Oper. Res..

[29]  Sanjoy Dasgupta,et al.  The curse of dimension in nonparametric regression , 2010 .

[30]  Aditya Bhaskara,et al.  Centrality of trees for capacitated k\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k$$\end{document}-center , 2014, Mathematical Programming.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Vahab S. Mirrokni,et al.  Distributed Balanced Partitioning via Linear Embedding , 2015, WSDM.

[34]  Marc Lelarge,et al.  Balanced graph edge partition , 2014, KDD.

[35]  Shai Ben-David,et al.  A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering , 2007, Machine Learning.

[36]  Shai Ben-David,et al.  Access to Unlabeled Data can Speed up Prediction Time , 2011, ICML.

[37]  Alexander J. Smola,et al.  Communication Efficient Distributed Machine Learning with the Parameter Server , 2014, NIPS.

[38]  Judit Bar-Ilan,et al.  How to Allocate Network Centers , 1993, J. Algorithms.

[39]  James Demmel,et al.  CA-SVM : Communication-Avoiding Support Vector Machines on Clusters , 2016 .

[40]  Ulrike von Luxburg,et al.  Nearest Neighbor Clustering: A Baseline Method for Consistent Clustering with Arbitrary Objective Functions , 2009, J. Mach. Learn. Res..

[41]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[42]  Martin J. Wainwright,et al.  Communication-efficient algorithms for statistical optimization , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[43]  Maria-Florina Balcan,et al.  Distributed k-means and k-median clustering on general communication topologies , 2013, NIPS.

[44]  Maria-Florina Balcan,et al.  Clustering under approximation stability , 2013, JACM.

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[47]  Shi Li,et al.  Approximating capacitated k-median with (1 + ∊)k open facilities , 2014, SODA.

[48]  S. Canu,et al.  Training Invariant Support Vector Machines using Selective Sampling , 2005 .

[49]  Martin J. Wainwright,et al.  Information-theoretic lower bounds for distributed statistical estimation with communication constraints , 2013, NIPS.