How to Intelligently Distribute Training Data to Multiple Compute Nodes : Distributed Machine Learning via Submodular Partitioning

In this paper we investigate the problem of training data partitioning for parallel learning of statistical models. Motivated by [10], we utilize submodular functions to model the utility of data subsets for training machine learning classifiers and formulate this problem mathematically as submodular partitioning. We introduce a simple and scalable greedy algorithm that near-optimally solves the submodular partitioning problem. We empirically demonstrate the efficacy of the proposed algorithm to obtain data partitioning for distributed optimization of convex and deep neural network objectives. Empirical evidences suggest that the intelligent data partitioning produced by the proposed framework leads to faster convergence in the case of distributed convex optimization, and better resulting models in the case of parallel neural network training.

[1]  Rishabh K. Iyer,et al.  Submodularity in Data Subset Selection and Active Learning , 2015, ICML.

[2]  Jan Vondrák,et al.  Optimal approximation for the submodular welfare problem in the value oracle model , 2008, STOC.

[3]  Jeff A. Bilmes,et al.  Submodular subset selection for large-scale speech training data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[5]  Andreas Krause,et al.  Efficient Minimization of Decomposable Submodular Functions , 2010, NIPS.

[6]  Vahab S. Mirrokni,et al.  Approximating submodular functions everywhere , 2009, SODA.

[7]  Xiaohui Zhang,et al.  Parallel training of Deep Neural Networks with Natural Gradient and Parameter Averaging , 2014, ICLR.

[8]  S. Thompson,et al.  Moore's law: the future of Si microelectronics , 2006 .

[9]  Rishabh K. Iyer,et al.  Learning Mixtures of Submodular Functions for Image Collection Summarization , 2014, NIPS.

[10]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[11]  Michel Minoux,et al.  Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[12]  H. B. McMahan,et al.  Robust Submodular Observation Selection , 2008 .

[13]  Jeff A. Bilmes,et al.  Unsupervised submodular subset selection for speech data , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  D. Golovin Max-min fair allocation of indivisible goods , 2005 .

[15]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[16]  Amin Saberi,et al.  An approximation algorithm for max-min fair allocation of indivisible goods , 2007, STOC '07.

[17]  Rishabh K. Iyer,et al.  Mixed Robust/Average Submodular Partitioning: Fast Algorithms, Guarantees, and Applications , 2015, NIPS.

[18]  Subhash Khot,et al.  Approximation Algorithms for the Max-Min Allocation Problem , 2007, APPROX-RANDOM.