暂无分享,去创建一个
[1] Roman Vershynin,et al. High-Dimensional Probability , 2018 .
[2] Frank Nielsen,et al. Lightlike Neuromanifolds, Occam's Razor and Deep Learning , 2019, ArXiv.
[3] Levent Sagun,et al. A jamming transition from under- to over-parametrization affects generalization in deep learning , 2018, Journal of Physics A: Mathematical and Theoretical.
[4] Koji Tsuda,et al. Legendre decomposition for tensors , 2018, NeurIPS.
[5] Alan Agresti,et al. Categorical Data Analysis , 2003 .
[6] F. Opitz. Information geometry and its applications , 2012, 2012 9th European Radar Conference.
[7] Philip M. Long,et al. Benign overfitting in linear regression , 2019, Proceedings of the National Academy of Sciences.
[8] K. Hofmann,et al. Continuous Lattices and Domains , 2003 .
[9] H. Akaike,et al. Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .
[10] G. Schwarz. Estimating the Dimension of a Model , 1978 .
[11] Mikhail Belkin,et al. Reconciling modern machine-learning practice and the classical bias–variance trade-off , 2018, Proceedings of the National Academy of Sciences.
[12] Sang Joon Kim,et al. A Mathematical Theory of Communication , 2006 .
[13] Geoffrey E. Hinton,et al. A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..
[14] Tengyu Ma,et al. Optimal Regularization Can Mitigate Double Descent , 2020, ICLR.
[15] Jorma Rissanen,et al. Stochastic Complexity in Learning , 1995, J. Comput. Syst. Sci..
[16] Taiji Suzuki,et al. Generalization of Two-layer Neural Networks: An Asymptotic Viewpoint , 2020, ICLR.
[17] Maxim Raginsky,et al. Information-theoretic analysis of generalization capability of learning algorithms , 2017, NIPS.
[18] Vladimir Vapnik,et al. The Nature of Statistical Learning , 1995 .
[19] David Tse,et al. Fundamentals of Wireless Communication , 2005 .
[20] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.
[21] A. Barron,et al. Jeffreys' prior is asymptotically least favorable under entropy risk , 1994 .
[22] Levent Sagun,et al. Scaling description of generalization with number of parameters in deep learning , 2019, Journal of Statistical Mechanics: Theory and Experiment.
[23] Boaz Barak,et al. Deep double descent: where bigger models and more data hurt , 2019, ICLR.
[24] Mikhail Belkin,et al. To understand deep learning we need to understand kernel learning , 2018, ICML.
[25] Shun-ichi Amari,et al. Information geometry of neural network—an overview , 1997 .
[26] Koji Tsuda,et al. Information decomposition on structured space , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).
[27] Shun-ichi Amari,et al. Information geometry on hierarchy of probability distributions , 2001, IEEE Trans. Inf. Theory.
[28] Shun-ichi Amari,et al. Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.
[29] Andrea Montanari,et al. Linearized two-layers neural networks in high dimension , 2019, The Annals of Statistics.
[30] Radford M. Neal. Pattern Recognition and Machine Learning , 2007, Technometrics.
[31] Vijay Balasubramanian,et al. A Geometric Formulation of Occam's Razor For Inference of Parametric Distributions , 1996, adap-org/9601001.
[32] Andrea Montanari,et al. Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.
[33] V. Balasubramanian. MDL , Bayesian Inference and the Geometry of the Space of Probability Distributions , 2006 .
[34] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[35] 渡邊 澄夫. Algebraic geometry and statistical learning theory , 2009 .
[36] Amiel Feinstein,et al. Information and information stability of random variables and processes , 1964 .
[37] H. Jeffreys. An invariant form for the prior probability in estimation problems , 1946, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences.
[38] Shun-ichi Amari,et al. Methods of information geometry , 2000 .
[39] Brian A. Davey,et al. An Introduction to Lattices and Order , 1989 .
[40] Koji Tsuda,et al. Tensor Balancing on Statistical Manifold , 2017, ICML.
[41] Jorma Rissanen,et al. Fisher information and stochastic complexity , 1996, IEEE Trans. Inf. Theory.
[42] C. R. Rao,et al. Information and the Accuracy Attainable in the Estimation of Statistical Parameters , 1992 .
[43] Florent Krzakala,et al. Double Trouble in Double Descent : Bias and Variance(s) in the Lazy Regime , 2020, ICML.
[44] Emre Telatar,et al. Capacity of Multi-antenna Gaussian Channels , 1999, Eur. Trans. Telecommun..
[45] Levent Sagun,et al. The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.