Nonparametric Bayesian Deep Networks with Local Competition

The aim of this work is to enable inference of deep networks that retain high accuracy for the least possible model complexity, with the latter deduced from the data during inference. To this end, we revisit deep networks that comprise competing linear units, as opposed to nonlinear units that do not entail any form of (local) competition. In this context, our main technical innovation consists in an inferential setup that leverages solid arguments from Bayesian nonparametrics. We infer both the needed set of connections or locally competing sets of units, as well as the required floating-point precision for storing the network parameters. Specifically, we introduce auxiliary discrete latent variables representing which initial network components are actually needed for modeling the data at hand, and perform Bayesian inference over them by imposing appropriate stick-breaking priors. As we experimentally show using benchmark datasets, our approach yields networks with less computational footprint than the state-of-the-art, and with no compromises in predictive accuracy.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Patrice Y. Simard,et al.  Best practices for convolutional neural networks applied to visual document analysis , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[3]  Max Welling,et al.  Bayesian Compression for Deep Learning , 2017, NIPS.

[4]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[5]  C. Stefanis Interneuronal mechanisms in the cortex. , 1969, UCLA forum in medical sciences.

[6]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[7]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[8]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[9]  Sergios Theodoridis,et al.  Machine Learning: A Bayesian and Optimization Perspective , 2015 .

[10]  Dmitry P. Vetrov,et al.  Structured Bayesian Pruning via Log-Normal Multiplicative Noise , 2017, NIPS.

[11]  Wolfgang Maass,et al.  On the Computational Power of Winner-Take-All , 2000, Neural Computation.

[12]  David Willshaw,et al.  The cerebellum as a neuronal machine , 1999 .

[13]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[14]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[15]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[18]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[19]  James Demmel,et al.  IEEE Standard for Floating-Point Arithmetic , 2008 .

[20]  Dmitry P. Vetrov,et al.  Variational Dropout Sparsifies Deep Neural Networks , 2017, ICML.

[21]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[22]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[23]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[24]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[25]  A. Lansner Associative memory models: from the cell-assembly theory to biophysically detailed cortex simulations , 2009, Trends in Neurosciences.

[26]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[27]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[28]  T. Lømo,et al.  Participation of inhibitory and excitatory interneurones in the control of hippocampal cortical output. , 1969, UCLA forum in medical sciences.

[29]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[30]  P. Kumaraswamy A generalized probability density function for double-bounded random processes , 1980 .

[31]  R. Douglas,et al.  Neuronal circuits of the neocortex. , 2004, Annual review of neuroscience.

[32]  Padhraic Smyth,et al.  Stick-Breaking Variational Autoencoders , 2016, ICLR.

[33]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[34]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[35]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[36]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[37]  Sotirios Chatzis,et al.  Indian Buffet Process Deep Generative Models for Semi-Supervised Classification , 2014, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[39]  Wolfgang Maass,et al.  Neural Computation with Winner-Take-All as the Only Nonlinear Operation , 1999, NIPS.