Stochastic Subset Selection

Current machine learning algorithms are designed to work with huge volumes of high dimensional data such as images. However, these algorithms are being increasingly deployed to resource constrained systems such as mobile devices and embedded systems. Even in cases where large computing infrastructure is available, the size of each data instance, as well as datasets, can provide a huge bottleneck in data transfer across communication channels. Also, there is a huge incentive both in energy and monetary terms in reducing both the computational and memory requirements of these algorithms. For non-parametric models that require to leverage the stored training data at the inference time, the increased cost in memory and computation could be even more problematic. In this work, we aim to reduce the volume of data these algorithms must process through an end-to-end two-stage neural subset selection model, where the first stage selects a set of candidate points using a conditionally independent Bernoulli mask followed by an iterative coreset selection via a conditional Categorical distribution. The subset selection model is trained by meta-learning with a distribution of sets. We validate our method on set reconstruction and classification tasks with feature selection as well as the selection of representative samples from a given dataset, on which our method outperforms relevant baselines. We also show in our experiments that our method enhances scalability of non-parametric models such as Neural Processes.

[1]  Alexei A. Efros,et al.  Dataset Distillation , 2018, ArXiv.

[2]  Alex Kendall,et al.  Concrete Dropout , 2017, NIPS.

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  H. Hensel,et al.  Neural processes in thermoregulation. , 1973, Physiological reviews.

[6]  Neil A. Dodgson,et al.  Fast Marching farthest point sampling , 2003, Eurographics.

[7]  Trevor Campbell,et al.  Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent , 2018, ICML.

[8]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[9]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[10]  David Zhang,et al.  Learning Convolutional Networks for Content-Weighted Image Compression , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Ersin Yumer,et al.  Budgeted Training: Rethinking Deep Neural Network Training Under Resource Constraints , 2019, ICLR.

[12]  Yee Whye Teh,et al.  Set Transformer , 2018, ArXiv.

[13]  Wei Wu,et al.  PointCNN: Convolution On X-Transformed Points , 2018, NeurIPS.

[14]  Luc Van Gool,et al.  Conditional Probability Models for Deep Image Compression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Lubomir D. Bourdev,et al.  Real-Time Adaptive Image Compression , 2017, ICML.

[16]  Yee Whye Teh,et al.  Neural Processes , 2018, ArXiv.

[17]  Jason Thornton,et al.  Learning Network Architectures of Deep CNNs Under Resource Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[18]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[20]  Yee Whye Teh,et al.  Set Transformer , 2018, ICML.

[21]  Gabriel Peyré,et al.  Stochastic Deep Networks , 2018, ICML.

[22]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[23]  Yee Whye Teh,et al.  Attentive Neural Processes , 2019, ICLR.

[24]  Trevor Campbell,et al.  Automated Scalable Bayesian Inference via Hilbert Coresets , 2017, J. Mach. Learn. Res..

[25]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[26]  Travis E. Oliphant,et al.  Guide to NumPy , 2015 .

[27]  Pradeep Varakantham,et al.  Resource Constrained Deep Reinforcement Learning , 2018, ICAPS.

[28]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[29]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[30]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Amos J. Storkey,et al.  Towards a Neural Statistician , 2016, ICLR.

[32]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[33]  Akiyoshi Sannai,et al.  Universal approximations of permutation invariant/equivariant functions by deep neural networks , 2019, ArXiv.

[34]  Shai Avidan,et al.  Learning to Sample , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  François Fleuret,et al.  Not All Samples Are Created Equal: Deep Learning with Importance Sampling , 2018, ICML.

[36]  Yehoshua Y. Zeevi,et al.  The farthest point strategy for progressive image sampling , 1997, IEEE Trans. Image Process..

[37]  Trevor Campbell,et al.  Coresets for Scalable Bayesian Logistic Regression , 2016, NIPS.

[38]  Barnabás Póczos,et al.  Deep Learning with Sets and Point Clouds , 2016, ICLR.