Deep Variational Information Bottleneck

We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack.

[1]  Boris Polyak,et al.  Acceleration of stochastic approximation by averaging , 1992 .

[2]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[3]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[4]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[5]  David Barber,et al.  The IM algorithm: a variational approach to Information Maximization , 2003, NIPS 2003.

[6]  William Bialek,et al.  How Many Clusters? An Information-Theoretic Perspective , 2003, Neural Computation.

[7]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[8]  W. Bialek,et al.  Information-based clustering. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Ohad Shamir,et al.  Learning and generalization with the information bottleneck , 2008, Theoretical Computer Science.

[11]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[14]  Lei Ying,et al.  On the relation between identifiability, differential privacy, and mutual-information privacy , 2014, 2014 52nd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[15]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[16]  Terrance E. Boult,et al.  Towards Open World Recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Benjamin Graham,et al.  Confusing Deep Convolution Networks by Relabelling , 2015, ArXiv.

[19]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[20]  Rahul Sukthankar,et al.  The Virtues of Peer Pressure: A Simple Method for Discovering High-Value Mistakes , 2015, CAIP.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Shakir Mohamed,et al.  Variational Information Maximisation for Intrinsically Motivated Reinforcement Learning , 2015, NIPS.

[23]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[24]  Michael J. Berry,et al.  Predictive information in a sensory population , 2013, Proceedings of the National Academy of Sciences.

[25]  Dale Schuurmans,et al.  Learning with a Strong Adversary , 2015, ArXiv.

[26]  Julien Cornebise,et al.  Weight Uncertainty in Neural Networks , 2015, ArXiv.

[27]  Ryan P. Browne,et al.  Multivariate sharp quadratic bounds via $\mathbf{\Sigma}$-strong convexity and the Fenchel connection , 2015 .

[28]  Seyed-Mohsen Moosavi-Dezfooli,et al.  DeepFool: A Simple and Accurate Method to Fool Deep Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Robustness of classifiers: from adversarial to random noise , 2016, NIPS.

[30]  Charles Blundell,et al.  Early Visual Concept Learning with Unsupervised Deep Learning , 2016, ArXiv.

[31]  Ananthram Swami,et al.  The Limitations of Deep Learning in Adversarial Settings , 2015, 2016 IEEE European Symposium on Security and Privacy (EuroS&P).

[32]  Olivier Marre,et al.  Relevant sparse codes with variational information bottleneck , 2016, NIPS.

[33]  Paul W. Cuff,et al.  Differential Privacy as a Mutual Information Constraint , 2016, CCS.

[34]  Xinyun Chen Under Review as a Conference Paper at Iclr 2017 Delving into Transferable Adversarial Ex- Amples and Black-box Attacks , 2016 .

[35]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[36]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[37]  David J. Fleet,et al.  Adversarial Manipulation of Deep Representations , 2015, ICLR.

[38]  Honglak Lee,et al.  Deep Variational Canonical Correlation Analysis , 2016, ArXiv.

[39]  Seyed-Mohsen Moosavi-Dezfooli,et al.  Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[41]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[42]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[43]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[44]  Omer Levy,et al.  Published as a conference paper at ICLR 2018 S IMULATING A CTION D YNAMICS WITH N EURAL P ROCESS N ETWORKS , 2018 .

[45]  Stefano Soatto,et al.  Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.