Training sample selection for deep learning of distributed data

The success of deep learning — in the form of multi-layer neural networks — depends critically on the volume and variety of training data. Its potential is greatly compromised when training data originate in a geographically distributed manner and are subject to bandwidth constraints. This paper presents a data sampling approach to deep learning, by carefully discriminating locally available training samples based on their relative importance. Towards this end, we propose two metrics for prioritizing candidate training samples as functions of their test trial outcome: correctness and confidence. Bandwidth-constrained simulations show significant performance gain of our proposed training sample selection schemes over convention uniform sampling: up to 15× bandwidth reduction for the MNIST dataset and 25% reduction in learning time for the CIFAR-10 dataset.

[1]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[2]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[4]  L. Jost Entropy and diversity , 2006 .

[5]  Joseph P. Romano,et al.  Bootstrap technology and applications , 1992 .

[6]  Ananda Theertha Suresh,et al.  Distributed Mean Estimation with Limited Communication , 2016, ICML.

[7]  Yoshua Bengio,et al.  Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.

[8]  Janis Keuper,et al.  Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability , 2016, 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC).

[9]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[10]  Peter Richtárik,et al.  Randomized Distributed Mean Estimation: Accuracy vs. Communication , 2016, Front. Appl. Math. Stat..

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[13]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[14]  Peter Richtárik,et al.  Federated Learning: Strategies for Improving Communication Efficiency , 2016, ArXiv.

[15]  E. Ziegel,et al.  Bootstrapping: A Nonparametric Approach to Statistical Inference , 1993 .

[16]  Simone Scardapane,et al.  Parallel and distributed training of neural networks via successive convex approximation , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).