暂无分享,去创建一个
Frank Nielsen | Ke Sun | Ke Sun | F. Nielsen
[1] Surya Ganguli,et al. On the Expressive Power of Deep Neural Networks , 2016, ICML.
[2] Yann Dauphin,et al. Empirical Analysis of the Hessian of Over-Parametrized Neural Networks , 2017, ICLR.
[3] Demir N. Kupeli. Singular Semi-Riemannian Geometry , 1996 .
[4] Frederik Kunstner,et al. Limitations of the empirical Fisher approximation for natural gradient descent , 2019, NeurIPS.
[5] Nicolas Le Roux,et al. Negative eigenvalues of the Hessian in deep neural networks , 2018, ICLR.
[6] Adam Gaier,et al. Weight Agnostic Neural Networks , 2019, NeurIPS.
[7] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, ArXiv.
[8] V. Jain,et al. On the geometry of lightlike submanifolds of indefinite statistical manifolds , 2019, 1903.07387.
[9] Xaq Pitkow,et al. Skip Connections Eliminate Singularities , 2017, ICLR.
[10] Masato Okada,et al. Dynamics of Learning in MLP: Natural Gradient and Singularity Revisited , 2018, Neural Computation.
[11] G. Schwarz. Estimating the Dimension of a Model , 1978 .
[12] Jeffrey Pennington,et al. The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network , 2018, NeurIPS.
[13] Surya Ganguli,et al. Deep Information Propagation , 2016, ICLR.
[14] Philip Thomas,et al. GeNGA: A Generalization of Natural Gradient Ascent with Positive and Negative Convergence Results , 2014, ICML.
[15] H. Akaike. A new look at the statistical model identification , 1974 .
[16] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[17] Surya Ganguli,et al. The Emergence of Spectral Universality in Deep Networks , 2018, AISTATS.
[18] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[19] T. Aoki,et al. On the category of stratifolds , 2016, 1605.04142.
[20] Yee Whye Teh,et al. A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2006, NIPS.
[21] Shun-ichi Amari,et al. Universal statistics of Fisher information in deep neural networks: mean field approach , 2018, AISTATS.
[22] Jorma Rissanen,et al. The Minimum Description Length Principle in Coding and Modeling , 1998, IEEE Trans. Inf. Theory.
[23] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.
[24] Shun-ichi Amari,et al. Dynamics of Learning Near Singularities in Layered Networks , 2008, Neural Computation.
[25] Surya Ganguli,et al. Exponential expressivity in deep neural networks through transient chaos , 2016, NIPS.
[26] Chico Q. Camargo,et al. Deep learning generalizes because the parameter-function map is biased towards simple functions , 2018, ICLR.
[27] K. L. Duggal. A Review on Unique Existence Theorems in Lightlike Geometry , 2014 .
[28] Zhinan Zhang,et al. The rank of a random matrix , 2007, Appl. Math. Comput..
[29] V. Marčenko,et al. DISTRIBUTION OF EIGENVALUES FOR SOME SETS OF RANDOM MATRICES , 1967 .
[30] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .
[31] Tao Zhang,et al. A Survey of Model Compression and Acceleration for Deep Neural Networks , 2017, ArXiv.
[32] A. Guionnet,et al. Free probability and random matrices , 2012 .
[33] D. C. Kay. Schaum's outline of theory and problems of tensor calculus , 1988 .
[34] F. Opitz. Information geometry and its applications , 2012, 2012 9th European Radar Conference.
[35] I. J. Myung,et al. Counting probability distributions: Differential geometry and model selection , 2000, Proc. Natl. Acad. Sci. USA.
[36] Krishan L. Duggal,et al. Lightlike Submanifolds of Semi-Riemannian Manifolds and Applications , 1996 .
[37] C. S. Wallace,et al. An Information Measure for Classification , 1968, Comput. J..
[38] Jeffrey Pennington,et al. Geometry of Neural Network Loss Surfaces via Random Matrix Theory , 2017, ICML.
[39] Frank Nielsen,et al. Relative Fisher Information and Natural Gradient for Learning Large Modular Models , 2017, ICML.
[40] V. Balasubramanian. MDL , Bayesian Inference and the Geometry of the Space of Probability Distributions , 2006 .
[41] 野水 克己,et al. Affine differential geometry : geometry of affine immersions , 1994 .
[42] Masato Okada,et al. Statistical mechanical analysis of learning dynamics of two-layer perceptron with multiple output units , 2019, Journal of Physics A: Mathematical and Theoretical.
[43] Mikhail Belkin,et al. Reconciling modern machine learning and the bias-variance trade-off , 2018, ArXiv.
[44] Yann Ollivier,et al. The Description Length of Deep Learning models , 2018, NeurIPS.
[45] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .
[46] J. Rissanen,et al. Modeling By Shortest Data Description* , 1978, Autom..
[47] Nathan Srebro,et al. Exploring Generalization in Deep Learning , 2017, NIPS.
[48] Jason Yosinski,et al. Measuring the Intrinsic Dimension of Objective Landscapes , 2018, ICLR.
[49] T. Roos,et al. Minimum Description Length Revisited , 2019, International Journal of Mathematics for Industry.
[50] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[51] M. Tripathi,et al. Geometry of lightlike hypersurfaces of a statistical manifold , 2019, 1901.09251.
[52] 渡邊 澄夫. Algebraic geometry and statistical learning theory , 2009 .
[53] Razvan Pascanu,et al. Revisiting Natural Gradient for Deep Networks , 2013, ICLR.
[54] Patrick D. McDaniel,et al. Making machine learning robust against adversarial inputs , 2018, Commun. ACM.
[55] Tomaso A. Poggio,et al. Fisher-Rao Metric, Geometry, and Complexity of Neural Networks , 2017, AISTATS.
[56] P. Grünwald. The Minimum Description Length Principle (Adaptive Computation and Machine Learning) , 2007 .
[57] Mikhail Belkin,et al. Two models of double descent for weak features , 2019, SIAM J. Math. Data Sci..
[58] Jorma Rissanen,et al. Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.