论文信息 - Information-Theoretic Methods in Deep Neural Networks: Recent Advances and Emerging Opportunities

Information-Theoretic Methods in Deep Neural Networks: Recent Advances and Emerging Opportunities

We present a review on the recent advances and emerging opportunities around the theme of analyzing deep neural networks (DNNs) with information-theoretic methods. We first discuss popular information-theoretic quantities and their estimators. We then introduce recent developments on information-theoretic learning principles (e.g., loss functions, regularizers and objectives) and their parameterization with DNNs. We finally briefly review current usages of informationtheoretic concepts in a few modern machine learning problems and list a few emerging opportunities.

[1] Gunnar Rätsch,et al. Matrix Exponentiated Gradient Updates for On-line Learning and Bregman Projection , 2004, J. Mach. Learn. Res..

[2] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[3] A. Kraskov,et al. Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[4] V. Moskvina,et al. An Algorithm Based on Singular Spectrum Analysis for Change-Point Detection , 2003 .

[5] Koby Crammer,et al. A theory of learning from different domains , 2010, Machine Learning.

[6] Robert Jenssen,et al. Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional , 2020, IEEE transactions on pattern analysis and machine intelligence.

[7] Zhaohui Wu,et al. Robust feature learning by stacked autoencoder with maximum correntropy criterion , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] Max Welling,et al. Auto-Encoding Variational Bayes , 2013, ICLR.

[9] Michael Satosi Watanabe,et al. Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[10] Alexander A. Alemi,et al. Deep Variational Information Bottleneck , 2017, ICLR.

[11] Ziv Goldfeld,et al. The Information Bottleneck Problem and its Applications in Machine Learning , 2020, IEEE Journal on Selected Areas in Information Theory.

[12] José Carlos Príncipe,et al. Balancing exploration and exploitation in reinforcement learning using a value of information criterion , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] José Carlos Príncipe,et al. Understanding Autoencoders with Information Theoretic Concepts , 2018, Neural Networks.

[14] Takafumi Kanamori,et al. Mutual information estimation reveals global associations between stimuli and biological processes , 2009, BMC Bioinformatics.

[15] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .

[16] Thomas Steinke,et al. Reasoning About Generalization via Conditional Mutual Information , 2020, COLT.

[17] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.

[18] Stefano Ermon,et al. A Theory of Usable Information Under Computational Constraints , 2020, ICLR.

[19] Marissa Connor,et al. Generative causal explanations of black-box classifiers , 2020, NeurIPS.

[20] Yizhou Wang,et al. L_DMI: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise , 2019, NeurIPS.

[21] Edwin R. Hancock,et al. Generative Graph Prototypes from Information Theory , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Oriol Vinyals,et al. Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[23] Wojciech Czarnecki,et al. On Loss Functions for Deep Neural Networks in Classification , 2017, ArXiv.

[24] Hossein Mobahi,et al. Predicting the Generalization Gap in Deep Networks with Margin Distributions , 2018, ICLR.

[25] José Carlos Príncipe,et al. Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications , 2020, IJCAI.

[26] Daniel J. Velleman. American Mathematical Monthly , 2010 .

[27] Mehryar Mohri,et al. Domain adaptation and sample bias correction theory and algorithm for regression , 2014, Theor. Comput. Sci..

[28] Barnabás Póczos,et al. Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs , 2010, NIPS.

[29] Yingsong Li,et al. Maximum Correntropy Criterion With Variable Center , 2019, IEEE Signal Processing Letters.

[30] Yochai Blau,et al. Direct Validation of the Information Bottleneck Principle for Deep Nets , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[31] Masashi Sugiyama,et al. Least-squares independence regression for non-linear causal inference under non-Gaussian noise , 2011, Machine Learning.

[32] Cian O'Donnell,et al. Adaptive Estimators Show Information Compression in Deep Neural Networks , 2019, ICLR.

[33] Le Song,et al. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[34] Yoshua Bengio,et al. Towards Causal Representation Learning , 2021, ArXiv.

[35] Shlomo Shamai,et al. On the Information Bottleneck Problems: Models, Connections, Applications and Information Theoretic Views , 2020, Entropy.

[36] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.

[37] Rana Ali Amjad,et al. Learning Representations for Neural Network-Based Classification Using the Information Bottleneck Principle , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38] Jagat Narain Kapur,et al. Measures of information and their applications , 1994 .

[39] Karl Stratos,et al. Formal Limitations on the Measurement of Mutual Information , 2018, AISTATS.

[40] Deniz Erdogmus,et al. An error-entropy minimization algorithm for supervised training of nonlinear adaptive systems , 2002, IEEE Trans. Signal Process..

[41] Michael Tschannen,et al. On Mutual Information Maximization for Representation Learning , 2019, ICLR.

[42] Thierry Paul,et al. Quantum computation and quantum information , 2007, Mathematical Structures in Computer Science.

[43] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[44] I. Stravinsky,et al. Gestural Control of Real-Time Concatenative Synthesis in Luna Park Grégory Beller Computer Music , 2011 .

[45] Shaofeng Zou,et al. Tightening Mutual Information Based Bounds on Generalization Error , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[46] Yunmei Chen,et al. On Kernel Method–Based Connectionist Models and Supervised Deep Learning Without Backpropagation , 2020, Neural Computation.

[47] José Carlos Príncipe,et al. Training a Bank of Wiener Models with a Novel Quadratic Mutual Information Cost Function , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48] Le Song,et al. A Kernel Statistical Test of Independence , 2007, NIPS.

[49] Aram Galstyan,et al. Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[50] Jose C. Principe,et al. Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.

[51] Jose C. Principe,et al. Measures of Entropy From Data Using Infinitely Divisible Kernels , 2012, IEEE Transactions on Information Theory.

[52] Weifeng Liu,et al. Correntropy: Properties and Applications in Non-Gaussian Signal Processing , 2007, IEEE Transactions on Signal Processing.

[53] Robert Jenssen,et al. Measuring Dependence with Matrix-based Entropy Functional , 2021, AAAI.

[54] Rob Brekelmans,et al. Auto-Encoding Total Correlation Explanation , 2018, AISTATS.

[55] Pramod Viswanath,et al. Demystifying fixed k-nearest neighbor information estimators , 2016, 2017 IEEE International Symposium on Information Theory (ISIT).

[56] Jose C. Principe,et al. Deep Deterministic Information Bottleneck with Matrix-Based Entropy Functional , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57] Stefano Soatto,et al. Information Dropout: Learning Optimal Representations Through Noisy Computation , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58] Constantine Kotropoulos,et al. Robust Multidimensional Scaling Using a Maximum Correntropy Criterion , 2017, IEEE Transactions on Signal Processing.

[59] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.

[60] Bernhard Schölkopf,et al. Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[61] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[62] L. Pardo. Statistical Inference Based on Divergence Measures , 2005 .

[63] Robert Jenssen,et al. The Cauchy-Schwarz divergence and Parzen windowing: Connections to graph theory and Mercer kernels , 2006, J. Frankl. Inst..

[64] Yonatan Belinkov,et al. Variational Information Bottleneck for Effective Low-Resource Fine-Tuning , 2021, ICLR.

[65] Barnabás Póczos,et al. On the Estimation of alpha-Divergences , 2011, AISTATS.

[66] Uri Shalit,et al. Robust learning with the Hilbert-Schmidt independence criterion , 2019, ICML.

[67] Sergey Levine,et al. Reinforcement Learning with Deep Energy-Based Policies , 2017, ICML.

[68] Jose C. Principe,et al. Measuring the Discrepancy between Conditional Distributions: Methods, Properties and Applications , 2020, IJCAI.

[69] Chelsea Finn,et al. Meta-Learning without Memorization , 2020, ICLR.

[70] Alfred O. Hero,et al. Scalable Mutual Information Estimation Using Dependence Graphs , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[71] Neil D. Lawrence,et al. Dataset Shift in Machine Learning , 2009 .

[72] Bernhard Schölkopf,et al. Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[73] G. Crooks. On Measures of Entropy and Information , 2015 .

[74] Leonardo Rey Vega,et al. Compression-Based Regularization With an Application to Multitask Learning , 2017, IEEE Journal of Selected Topics in Signal Processing.

[75] Bernhard C. Geiger,et al. Understanding Individual Neuron Importance Using Information Theory , 2018, ArXiv.

[76] Germán Castellanos-Domínguez,et al. Relevant information undersampling to support imbalanced data classification , 2021, Neurocomputing.

[77] Wei Wu,et al. Explaining a black-box using Deep Variational Information Bottleneck Approach , 2019, ArXiv.

[78] Fengmao Lv,et al. Can Cross Entropy Loss Be Robust to Label Noise? , 2020, IJCAI.

[79] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[80] David H. Wolpert,et al. Nonlinear Information Bottleneck , 2017, Entropy.

[81] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.

[82] Sergey Levine,et al. Wasserstein Dependency Measure for Representation Learning , 2019, NeurIPS.

[83] W. Marsden. I and J , 2012 .

[84] Yu Cheng,et al. InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective , 2020, ArXiv.

[85] Rishabh Singh,et al. Time Series Analysis using a Kernel based Multi-Modal Uncertainty Decomposition Framework , 2020, UAI.

[86] Badong Chen,et al. Insights Into the Robustness of Minimum Error Entropy Estimation , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[87] Yoshua Bengio,et al. Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[88] Jure Leskovec,et al. Graph Information Bottleneck , 2020, NeurIPS.

[89] Nicolas Le Roux,et al. Understanding the impact of entropy on policy optimization , 2018, ICML.

[90] Nicky J Welton,et al. Value of Information , 2015, Medical decision making : an international journal of the Society for Medical Decision Making.

[91] R. J. Joenk,et al. IBM journal of research and development: information for authors , 1978 .

[92] Barnabás Póczos,et al. Generalized Exponential Concentration Inequality for Renyi Divergence Estimation , 2014, ICML.

[93] Yantao Wei,et al. Multiscale principle of relevant information for hyperspectral image classification , 2019, Machine Learning.

[94] Lantao Yu,et al. Training Deep Energy-Based Models with f-Divergence Minimization , 2020, ICML.