论文信息 - Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple Deep Learning (VSDL) model, whose behavior is controlled by two control parameters, one describing an effective amount of data, or load, on the network (that decreases when noise is added to the input), and one with an effective temperature interpretation (that increases when algorithms are early stopped). Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc.

Michael W. Mahoney | Charles H. Martin | Charles H. Martin

[1] A. A. Mullin,et al. Principles of neurodynamics , 1962 .

[2] W. Little. The existence of persistent states in the brain , 1974 .

[3] A. N. Tikhonov,et al. Solutions of ill-posed problems , 1977 .

[4] J J Hopfield,et al. Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[5] Leslie G. Valiant,et al. A theory of the learnable , 1984, STOC '84.

[6] Geoffrey E. Hinton,et al. Learning representations by back-propagating errors , 1986, Nature.

[7] Geoffrey E. Hinton,et al. Learning and relearning in Boltzmann machines , 1986 .

[8] P. Carnevali,et al. Exhaustive Thermodynamical Analysis of Boolean Learning Networks , 1987 .

[9] Esther Levin,et al. A statistical approach to learning and generalization in layered neural networks , 1989, Proc. IEEE.

[10] E. Gardner,et al. Three unfinished works on the optimal storage capacity of networks , 1989 .

[11] Györgyi,et al. First-order transition to perfect generalization in a neural network with binary synapses. , 1990, Physical review. A, Atomic, molecular, and optical physics.

[12] Rose,et al. Statistical mechanics and phase transitions in clustering. , 1990, Physical review letters.

[13] Kanter,et al. Statistical mechanics of a multilayered neural network. , 1990, Physical review letters.

[14] Sompolinsky,et al. Learning from examples in large neural networks. , 1990, Physical review letters.

[15] Yann LeCun,et al. Constrained neural networks for pattern recognition , 1991 .

[16] D. Hansel,et al. Memorization without generalization in a multilayered neural network , 1992 .

[17] Albrecht Rau,et al. Statistical mechanics of neural networks , 1992 .

[18] Sompolinsky,et al. Statistical mechanics of learning from examples. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[19] John A. Hertz,et al. Statistical Mechanics of Learning in a Large Committee Machine , 1992, NIPS.

[20] Solla,et al. Learning in linear neural networks: The validity of the annealed approximation. , 1992, Physical review. A, Atomic, molecular, and optical physics.

[21] Shun-ichi Amari,et al. Four Types of Learning Curves , 1992, Neural Computation.

[22] Gerald Tesauro,et al. How Tight Are the Vapnik-Chervonenkis Bounds? , 1992, Neural Computation.

[23] H. Schwarze. Learning a rule in a multilayer neural network , 1993 .

[24] T. Watkin,et al. THE STATISTICAL-MECHANICS OF LEARNING A RULE , 1993 .

[25] A. Engel,et al. Statistical mechanics calculation of Vapnik-Chervonenkis bounds for perceptrons , 1993 .

[26] Oh,et al. Generalization in a two-layer neural network. , 1993, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[27] Heekuck Oh,et al. Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[28] Opper,et al. Learning and generalization in a two-layer neural network: The role of the Vapnik-Chervonvenkis dimension. , 1994, Physical review letters.

[29] A. Engel,et al. Reliability of Replica Symmetry for the Generalization Problem of a Toy Multilayer Neural Network , 1994 .

[30] Yann LeCun,et al. Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[31] Y. Kabashima. Perfect loss of generalization due to noise in K=2 parity machines , 1994 .

[32] D. Haussler,et al. Rigorous learning curve bounds from statistical mechanics , 1994, COLT '94.

[33] Michael Biehl,et al. On-line backpropagation in two-layered neural networks , 1995 .

[34] Manfred OPPERInstitut. Perceptron Learning: the Largest Version Space , 1995 .

[35] N. Caticha,et al. On-line learning in the committee machine , 1995 .

[36] Van den Broeck C,et al. Storage capacity and generalization error for the reversed-wedge Ising perceptron. , 1995, Physical review. E, Statistical physics, plasmas, fluids, and related interdisciplinary topics.

[37] S. Kak. Information, physics, and computation , 1996 .

[38] N. Caticha,et al. On-line learning in parity machines , 1996 .