Flow of Renyi information in deep neural networks

We propose a rate-distortion based deep neural network (DNN) training algorithm using a smooth matrix functional on the manifold of positive semi-definite matrices as the non-parametric entropy estimator. The objective in the optimization function includes not only the measure of performance of the output layer but also the measure of information distortion between consecutive layers in order to produce a concise representation of its input on each layer. An experiment on speech emotion recognition shows the DNN trained by such method reaches comparable performance with an encoder-decoder system.

[1]  D. Rajan Probability, Random Variables, and Stochastic Processes , 2017 .

[2]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[3]  Jose C. Principe,et al.  Measures of Entropy From Data Using Infinitely Divisible Kernels , 2012, IEEE Transactions on Information Theory.

[4]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[5]  Naren Ramakrishnan,et al.  Flow of Information in Feed-Forward Deep Neural Networks , 2016, ArXiv.

[6]  Björn W. Schuller,et al.  Context-Sensitive Learning for Enhanced Audiovisual Emotion Classification , 2012, IEEE Transactions on Affective Computing.

[7]  Gal Chechik,et al.  Information Bottleneck for Gaussian Variables , 2003, J. Mach. Learn. Res..

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Ohad Shamir,et al.  Learning and generalization with the information bottleneck , 2008, Theoretical Computer Science.

[10]  Carlos Busso,et al.  IEMOCAP: interactive emotional dyadic motion capture database , 2008, Lang. Resour. Evaluation.

[11]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[12]  John G. van Bosse,et al.  Wiley Series in Telecommunications and Signal Processing , 2006 .

[13]  David J. Schwab,et al.  An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.

[14]  José Carlos Príncipe,et al.  Information Theoretic Learning with Infinitely Divisible Kernels , 2013, ICLR.

[15]  Barnabás Póczos,et al.  Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[16]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[17]  Jacob Goldberger,et al.  ICA based on a Smooth Estimation of the Differential Entropy , 2008, NIPS.

[18]  José Carlos Príncipe,et al.  Rate-Distortion Auto-Encoders , 2013, ICLR.

[19]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.