暂无分享,去创建一个
[1] A. A. Mullin,et al. Principles of neurodynamics , 1962 .
[2] W. Little. The existence of persistent states in the brain , 1974 .
[3] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .
[4] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.
[5] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.
[6] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.
[7] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .
[8] P. Carnevali,et al. Exhaustive Thermodynamical Analysis of Boolean Learning Networks , 1987 .
[9] Esther Levin,et al. A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.
[10] E. Gardner,et al. Three unfinished works on the optimal storage capacity of networks , 1989 .
[11] Györgyi,et al. First-order transition to perfect generalization in a neural network with binary synapses. , 1990, Physical review. A, Atomic, molecular, and optical physics.
[12] Rose,et al. Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.
[13] Kanter,et al. Statistical mechanics of a multilayered neural network. , 1990, Physical review letters.
[14] Sompolinsky,et al. Learning from examples in large neural networks. , 1990, Physical review letters.
[15] Yann LeCun,et al. Constrained neural networks for pattern recognition , 1991 .
[16] D. Hansel,et al. Memorization without generalization in a multilayered neural network , 1992 .
[17] Albrecht Rau,et al. Statistical mechanics of neural networks , 1992 .
[18] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[19] John A. Hertz,et al. Statistical Mechanics of Learning in a Large Committee Machine , 1992, NIPS.
[20] Solla,et al. Learning in linear neural networks: The validity of the annealed approximation. , 1992, Physical review. A, Atomic, molecular, and optical physics.
[21] Shun-ichi Amari,et al. Four Types of Learning Curves , 1992, Neural Computation.
[22] Gerald Tesauro,et al. How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.
[23] H. Schwarze. Learning a rule in a multilayer neural network , 1993 .
[24] T. Watkin,et al. THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .
[25] A. Engel,et al. Statistical mechanics calculation of Vapnik-Chervonenkis bounds for perceptrons , 1993 .
[26] Oh,et al. Generalization in a two-layer neural network. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[27] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..
[28] Opper,et al. Learning and generalization in a two-layer neural network: The role of the Vapnik-Chervonvenkis dimension. , 1994, Physical review letters.
[29] A. Engel,et al. Reliability of Replica Symmetry for the Generalization Problem of a Toy Multilayer Neural Network , 1994 .
[30] Yann LeCun,et al. Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.
[31] Y. Kabashima. Perfect loss of generalization due to noise in K=2 parity machines , 1994 .
[32] D. Haussler,et al. Rigorous learning curve bounds from statistical mechanics , 1994, COLT '94.
[33] Michael Biehl,et al. On-line backpropagation in two-layered neural networks , 1995 .
[34] Manfred OPPERInstitut. Perceptron Learning: the Largest Version Space , 1995 .
[35] N. Caticha,et al. On-line learning in the committee machine , 1995 .
[36] Van den Broeck C,et al. Storage capacity and generalization error for the reversed-wedge Ising perceptron. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[37] S. Kak. Information, physics, and computation , 1996 .
[38] N. Caticha,et al. On-line learning in parity machines , 1996 .
[39] Kinouchi,et al. Equivalence between learning in noisy perceptrons and tree committee machines. , 1996, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.
[40] Peter L. Bartlett,et al. For Valid Generalization the Size of the Weights is More Important than the Size of the Network , 1996, NIPS.
[41] Klaus Schulten,et al. A Numerical Study on Learning Curves in Stochastic Multilayer Feedforward Networks , 1996, Neural Computation.
[42] Michael Biehl,et al. Transient dynamics of on-line learning in two-layered neural networks , 1996 .
[43] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.
[44] Vladimir Vapnik,et al. Statistical learning theory , 1998 .
[45] S. Bös. STATISTICAL MECHANICS APPROACH TO EARLY STOPPING AND WEIGHT DECAY , 1998 .
[46] M. Talagrand. Replica symmetry breaking and exponential inequalities for the Sherrington-Kirkpatrick model , 2000 .
[47] Peter L. Bartlett,et al. Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..
[48] Christian Van den Broeck,et al. Statistical Mechanics of Learning , 2001 .
[49] Andreas Engel,et al. Complexity of learning in artificial neural networks , 2001, Theor. Comput. Sci..
[50] V. Akila,et al. Information , 2001, The Lancet.
[51] E. Bolthausen,et al. The Random Energy Model , 2002 .
[52] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.
[53] Ronald,et al. Learning representations by backpropagating errors , 2004 .
[54] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.
[55] Y. Yao,et al. On Early Stopping in Gradient Descent Learning , 2007 .
[56] Michael W. Mahoney,et al. Learning with Spectral Kernels and Heavy-Tailed Data , 2009, ArXiv.
[57] Surya Ganguli,et al. Statistical mechanics of compressed sensing. , 2010, Physical review letters.
[58] Antonio Auffinger,et al. Random Matrices and Complexity of Spin Glasses , 2010, 1003.1129.
[59] Michael W. Mahoney,et al. Implementing regularization implicitly via approximate eigenvector computation , 2010, ICML.
[60] Michael W. Mahoney,et al. Regularized Laplacian Estimation and Fast Eigenvector Approximation , 2011, NIPS.
[61] Lutz Prechelt,et al. Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.
[62] Klaus-Robert Müller,et al. Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.
[63] Michael W. Mahoney. Approximate computation and implicit regularization for very large-scale data analysis , 2012, PODS.
[64] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[65] Y. Fyodorov. High-Dimensional Random Fields and Random Matrix Theory , 2013, 1307.2379.
[66] S. Ganguli,et al. Statistical mechanics of complex neural systems and high dimensional data , 2013, 1301.7115.
[67] Surya Ganguli,et al. On the saddle point problem for non-convex optimization , 2014, ArXiv.
[68] Surya Ganguli,et al. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.
[69] Joan Bruna,et al. Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.
[70] Léon Bottou,et al. Making Vapnik–Chervonenkis Bounds Accurate , 2015 .
[71] Ryota Tomioka,et al. In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning , 2014, ICLR.
[72] Oriol Vinyals,et al. Qualitatively characterizing neural network optimization problems , 2014, ICLR.
[73] Yann LeCun,et al. Explorations on high dimensional landscapes , 2014, ICLR.
[74] Florent Krzakala,et al. Statistical physics of inference: thresholds and algorithms , 2015, ArXiv.
[75] Stefano Soatto,et al. On the energy landscape of deep networks , 2015, 1511.06485.
[76] Yann LeCun,et al. The Loss Surfaces of Multilayer Networks , 2014, AISTATS.
[77] P. Chaudhari,et al. The Effect of Gradient Noise on the Energy Landscape of Deep Networks , 2015 .
[78] Jonathan Krause,et al. The Unreasonable Effectiveness of Noisy Data for Fine-Grained Recognition , 2015, ECCV.
[79] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[80] Surya Ganguli,et al. Statistical Mechanics of Optimal Convex Inference in High Dimensions , 2016 .
[81] Seyed-Mohsen Moosavi-Dezfooli,et al. Universal Adversarial Perturbations , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[82] Naftali Tishby,et al. Opening the Black Box of Deep Neural Networks via Information , 2017, ArXiv.
[83] Leslie Pack Kaelbling,et al. Generalization in Deep Learning , 2017, ArXiv.
[84] Samy Bengio,et al. Understanding deep learning requires rethinking generalization , 2016, ICLR.
[85] Razvan Pascanu,et al. Sharp Minima Can Generalize For Deep Nets , 2017, ICML.
[86] Jorge Nocedal,et al. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , 2016, ICLR.
[87] Nir Shavit,et al. Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.
[88] Matus Telgarsky,et al. Spectrally-normalized margin bounds for neural networks , 2017, NIPS.
[89] Mario Michael Krell,et al. A Capacity Scaling Law for Artificial Neural Networks , 2017, ArXiv.
[90] Joseph Gonzalez,et al. On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent , 2018, ArXiv.
[91] Tomaso A. Poggio,et al. Theory IIIb: Generalization in Deep Networks , 2018, ArXiv.
[92] Tomaso A. Poggio,et al. A Surprising Linear Relationship Predicts Test Performance in Deep Networks , 2018, ArXiv.
[93] Jascha Sohl-Dickstein,et al. Measuring the Effects of Data Parallelism on Neural Network Training , 2018, J. Mach. Learn. Res..
[94] M. Kearns. ON THE CONSEQUENCES OF THE STATISTICAL MECHANICS THEORY OF LEARNING CURVES FOR THE MODEL SELECTION PROBLEM , .