Understanding Autoencoders with Information Theoretic Concepts
暂无分享,去创建一个
[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.
[2] Simon Haykin,et al. Neural Networks: A Comprehensive Foundation , 1998 .
[3] T. Liggett. Interacting Particle Systems , 1985 .
[4] Linda G. Shapiro,et al. Modeling Stylized Character Expressions via Deep Learning , 2016, ACCV.
[5] Yoshua Bengio,et al. Understanding intermediate layers using linear classifier probes , 2016, ICLR.
[6] Alexander Binder,et al. On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.
[7] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.
[8] L. Goddard. Information Theory , 1962, Nature.
[9] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..
[10] Alessandro Rozza,et al. Minimum Neighbor Distance Estimators of Intrinsic Dimension , 2011, ECML/PKDD.
[11] Terrence J. Sejnowski,et al. Learning Overcomplete Representations , 2000, Neural Computation.
[12] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.
[13] Geoffrey E. Hinton,et al. Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.
[14] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[15] Ralph Linsker,et al. Self-organization in a perceptual network , 1988, Computer.
[16] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[17] Christopher Burgess,et al. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.
[18] Graham W. Taylor,et al. Adaptive deconvolutional networks for mid and high level feature learning , 2011, 2011 International Conference on Computer Vision.
[19] Jose C. Principe,et al. Information Theoretic Learning - Renyi's Entropy and Kernel Perspectives , 2010, Information Theoretic Learning.
[20] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[21] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .
[22] Alessandro Rozza,et al. DANCo: An intrinsic dimensionality estimator exploiting angle and norm concentration , 2014, Pattern Recognit..
[23] Artemy Kolchinsky,et al. Estimating Mixture Entropy with Pairwise Distances , 2017, Entropy.
[24] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .
[25] Antonino Staiano,et al. Intrinsic dimension estimation: Advances and open problems , 2016, Inf. Sci..
[26] Yousef Saad,et al. Trace optimization and eigenproblems in dimension reduction methods , 2011, Numer. Linear Algebra Appl..
[27] Razvan Pascanu,et al. On the number of response regions of deep feed forward networks with piece-wise linear activations , 2013, 1312.6098.
[28] F. Takens. Detecting strange attractors in turbulence , 1981 .
[29] Roland Vollgraf,et al. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.
[30] Christian Lebiere,et al. The Cascade-Correlation Learning Architecture , 1989, NIPS.
[31] A. Rényi. On Measures of Entropy and Information , 1961 .
[32] Badong Chen,et al. Universal Approximation with Convex Optimization: Gimmick or Reality? [Discussion Forum] , 2015, IEEE Computational Intelligence Magazine.
[33] Jose C. Principe,et al. Measures of Entropy From Data Using Infinitely Divisible Kernels , 2012, IEEE Transactions on Information Theory.
[34] Ralph Linsker,et al. How to Generate Ordered Maps by Maximizing the Mutual Information between Input and Output Signals , 1989, Neural Computation.
[35] Robert Jenssen,et al. Multivariate Extension of Matrix-based Renyi's α-order Entropy Functional , 2020, IEEE transactions on pattern analysis and machine intelligence.
[36] Eder Santana,et al. Autoencoders trained with relevant information: Blending Shannon and Wiener's perspectives , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[37] Neri Merhav,et al. Data Processing Theorems and the Second Law of Thermodynamics , 2010, IEEE Transactions on Information Theory.
[38] Rajendra Bhatia,et al. Infinitely Divisible Matrices , 2006, Am. Math. Mon..
[39] Che-Wei Huang,et al. Flow of Renyi information in deep neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).
[40] T M Li Ge Te. Interacting Particle Systems , 2013 .
[41] Alexander Binder,et al. Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..
[42] I. Csiszár. A class of measures of informativity of observation channels , 1972 .
[43] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.
[44] Dapeng Oliver Wu,et al. Why Deep Learning Works: A Manifold Disentanglement Perspective , 2016, IEEE Transactions on Neural Networks and Learning Systems.
[45] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[46] David J. Schwab,et al. An exact mapping between the Variational Renormalization Group and Deep Learning , 2014, ArXiv.
[47] V. Kvasnicka,et al. Neural and Adaptive Systems: Fundamentals Through Simulations , 2001, IEEE Trans. Neural Networks.
[48] Andrea Vedaldi,et al. Understanding deep image representations by inverting them , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[49] Venu Govindaraju,et al. Why Regularized Auto-Encoders learn Sparse Representation? , 2015, ICML.
[50] J. S. Marron,et al. A scale-based approach to finding effective dimensionality in manifold learning , 2007, 0710.5349.
[51] Simon Haykin,et al. Neural Networks and Learning Machines , 2010 .
[52] Heng Zhang,et al. Mutual information-based RBM neural networks , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).
[53] Jose C. Principe,et al. Breaker status uncovered by autoencoders under unsupervised maximum mutual information training , 2013 .
[54] Kurt Hornik,et al. Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.
[55] José Carlos Príncipe,et al. Rate-Distortion Auto-Encoders , 2013, ICLR.
[56] Hod Lipson,et al. Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.
[57] Jason Yosinski,et al. Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks , 2016, ArXiv.
[58] Liam Paninski,et al. Estimation of Entropy and Mutual Information , 2003, Neural Computation.
[59] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[60] Alexander Binder,et al. Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.
[62] Jakob Hoydis,et al. An Introduction to Deep Learning for the Physical Layer , 2017, IEEE Transactions on Cognitive Communications and Networking.
[63] David D. Cox,et al. On the information bottleneck theory of deep learning , 2018, ICLR.
[64] D. Vere-Jones. Markov Chains , 1972, Nature.
[65] Naftali Tishby,et al. The information bottleneck method , 2000, ArXiv.
[66] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[67] Razvan Pascanu,et al. On the Number of Linear Regions of Deep Neural Networks , 2014, NIPS.
[68] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[69] Quanshi Zhang,et al. Interpreting CNN knowledge via an Explanatory Graph , 2017, AAAI.
[70] Max Tegmark,et al. Why Does Deep and Cheap Learning Work So Well? , 2016, Journal of Statistical Physics.
[71] José Carlos Príncipe,et al. Training MLPs layer-by-layer with the information potential , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).
[72] Michel Verleysen,et al. Kernel-based dimensionality reduction using Renyi's α-entropy measures of similarity , 2017, Neurocomputing.
[73] Badong Chen,et al. System Parameter Identification: Information Criteria and Algorithms , 2013 .
[74] M. K. Ali,et al. Neural networks for estimating intrinsic dimension. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.
[75] S. Stigler. Gauss and the Invention of Least Squares , 1981 .
[76] Yoshua Bengio,et al. Better Mixing via Deep Representations , 2012, ICML.
[77] Yoshua Bengio,et al. Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding , 2013, INTERSPEECH.
[78] Christopher J. Rozell,et al. Stable Takens' Embeddings for Linear Dynamical Systems , 2010, IEEE Transactions on Signal Processing.
[79] David J. C. MacKay,et al. Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.
[80] C. D. Kemp,et al. Density Estimation for Statistics and Data Analysis , 1987 .
[81] Naftali Tishby,et al. Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).
[82] S. P. Luttrell,et al. A Bayesian Analysis of Self-Organizing Maps , 1994, Neural Computation.
[83] Yoshua Bengio,et al. How Auto-Encoders Could Provide Credit Assignment in Deep Networks via Target Propagation , 2014, ArXiv.
[84] Changsheng Xu,et al. Understanding Deep Learning Generalization by Maximum Entropy , 2017, ArXiv.
[85] A. Pinkus. Ridge Functions: Approximation Algorithms , 2015 .
[86] Peter J. Bickel,et al. Maximum Likelihood Estimation of Intrinsic Dimension , 2004, NIPS.
[87] Aram Galstyan,et al. Efficient Estimation of Mutual Information for Strongly Dependent Variables , 2014, AISTATS.
[88] Naren Ramakrishnan,et al. Flow of Information in Feed-Forward Deep Neural Networks , 2016, ArXiv.
[89] Yasuaki Kuroe,et al. A learning method of nonlinear mappings by neural networks with considering their derivatives , 1993, Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan).
[90] Ibrahim M. Alabdulmohsin. An Information-Theoretic Route from Generalization in Expectation to Generalization in Probability , 2017, AISTATS.
[91] A. Kraskov,et al. Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.
[92] Maciej Krawczak. Multilayer Neural Networks: A Generalized Net Perspective , 2013 .
[93] Thomas M. Cover,et al. Elements of Information Theory , 2005 .
[94] S. Jansen,et al. On the notion(s) of duality for Markov processes , 2012, 1210.7193.
[95] Nicky J Welton,et al. Value of Information , 2015, Medical decision making : an international journal of the Society for Medical Decision Making.
[96] S. Haykin,et al. Adaptive Filter Theory , 1986 .