Competing Mutual Information Constraints with Stochastic Competition-based Activations for Learning Diversified Representations

This work aims to address the long-established problem of learning diversified representations. To this end, we combine information-theoretic arguments with stochastic competitionbased activations, namely Stochastic Local Winner-Takes-All (LWTA) units. In this context, we ditch the conventional deep architectures commonly used in Representation Learning, that rely on non-linear activations; instead, we replace them with sets of locally and stochastically competing linear units. In this setting, each network layer yields sparse outputs, determined by the outcome of the competition between units that are organized into blocks of competitors. We adopt stochastic arguments for the competition mechanism, which perform posterior sampling to determine the winner of each block. We further endow the considered networks with the ability to infer the sub-part of the network that is essential for modeling the data at hand; we impose appropriate stick-breaking priors to this end. To further enrich the information of the emerging representations, we resort to information-theoretic principles, namely the Information Competing Process (ICP). Then, all the components are tied together under the stochastic Variational Bayes framework for inference. We perform a thorough experimental investigation for our approach using benchmark datasets on image classification. As we experimentally show, the resulting networks yield significant discriminative representation learning abilities. In addition, the introduced paradigm allows for a principled investigation mechanism of the emerging intermediate network representations.

[1]  Sergios Theodoridis,et al.  Stochastic Local Winner-Takes-All Networks Enable Profound Adversarial Robustness , 2021, ArXiv.

[2]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  S. Grossberg Contour Enhancement , Short Term Memory , and Constancies in Reverberating Neural Networks , 1973 .

[6]  C. Stefanis Interneuronal mechanisms in the cortex. , 1969, UCLA forum in medical sciences.

[7]  Sergios Theodoridis,et al.  Local Competition and Stochasticity for Adversarial Robustness in Deep Learning , 2021, AISTATS.

[8]  Jürgen Schmidhuber,et al.  Compete to Compute , 2013, NIPS.

[9]  M. Alexander,et al.  Principles of Neural Science , 1981 .

[10]  David D. Cox,et al.  On the information bottleneck theory of deep learning , 2018, ICLR.

[11]  Alexander A. Alemi,et al.  Uncertainty in the Variational Information Bottleneck , 2018, ArXiv.

[12]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[13]  Thomas L. Griffiths,et al.  Infinite latent feature models and the Indian buffet process , 2005, NIPS.

[14]  Dimitris N. Metaxas,et al.  Stochastic Transformer Networks with Linear Competing Units: Application to end-to-end SL Translation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Ralph Linsker,et al.  Self-organization in a perceptual network , 1988, Computer.

[16]  Harri Valpola,et al.  Tagger: Deep Unsupervised Perceptual Grouping , 2016, NIPS.

[17]  Alexander A. Alemi,et al.  Deep Variational Information Bottleneck , 2017, ICLR.

[18]  Yee Whye Teh,et al.  Stick-breaking Construction for the Indian Buffet Process , 2007, AISTATS.

[19]  Stephen Grossberg,et al.  The ART of adaptive pattern recognition by a self-organizing neural network , 1988, Computer.

[20]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[21]  Qi Tian,et al.  Information Competing Process for Learning Diversified Representations , 2019, NeurIPS.

[22]  Michael J. Black,et al.  Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera Motion, Optical Flow and Motion Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  R. Douglas,et al.  Neuronal circuits of the neocortex. , 2004, Annual review of neuroscience.

[24]  Sergios Theodoridis,et al.  Nonparametric Bayesian Deep Networks with Local Competition , 2018, ICML.

[25]  A. Lansner Associative memory models: from the cell-assembly theory to biophysically detailed cortex simulations , 2009, Trends in Neurosciences.

[26]  Aaron C. Courville,et al.  MINE: Mutual Information Neural Estimation , 2018, ArXiv.

[27]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[28]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[29]  P. Kumaraswamy A generalized probability density function for double-bounded random processes , 1980 .

[30]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[31]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[32]  Alexander A. Alemi,et al.  Fixing a Broken ELBO , 2017, ICML.

[33]  Naftali Tishby,et al.  Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[34]  David P. Wipf,et al.  Compressing Neural Networks using the Variational Information Bottleneck , 2018, ICML.

[35]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[36]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[38]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[39]  T. Lømo,et al.  Participation of inhibitory and excitatory interneurones in the control of hippocampal cortical output. , 1969, UCLA forum in medical sciences.

[40]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[41]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.