The Classification Game: Complexity Regularization through Interaction

We show that if a population of neural network agents is allowed to interact during learning, so as to arrive at a consensus solution to the learning problem, then they can implicitly achieve complexity regularization. We call this learning paradigm, the classification game. We characterize the game-theoretic equilibria of this system, and show how low-complexity equilibria get selected. The benefit of finding a low-complexity solution is better expected generalization. We demonstrate this benefit through experiments.

[1]  Bernhard Sendhoff,et al.  Neural network regularization and ensembling using multi-objective evolutionary algorithms , 2004, Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No.04TH8753).

[2]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[3]  Related Topics,et al.  Nonparametric functional estimation and related topics , 1991 .

[4]  L. Gasser,et al.  Language Evolution on a Dynamic Social Network , 2008 .

[5]  Geoffrey E. Hinton Learning Translation Invariant Recognition in Massively Parallel Networks , 1987, PARLE.

[6]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[7]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  P. J. Werbos,et al.  Backpropagation: past and future , 1988, IEEE 1988 International Conference on Neural Networks.

[10]  Les Gasser,et al.  The classification game: combining supervised learning and language evolution , 2010, Connect. Sci..

[11]  A. J. Nijman,et al.  PARLE Parallel Architectures and Languages Europe , 1987, Lecture Notes in Computer Science.

[12]  John G. Taylor,et al.  Dynamics of multilayer networks in the vicinity of temporary minima , 1999, Neural Networks.

[13]  Amos Storkey,et al.  Advances in Neural Information Processing Systems 20 , 2007 .

[14]  Peixun Luo,et al.  Dynamical and stationary properties of on-line learning from finite training sets. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[15]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[16]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[17]  Alex Pentland,et al.  Face recognition using eigenfaces , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Geoffrey E. Hinton,et al.  Keeping the neural networks simple by minimizing the description length of the weights , 1993, COLT '93.

[19]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[22]  Ming Li,et al.  Sharpening Occam's razor , 2002, Inf. Process. Lett..

[23]  David H. Wolpert,et al.  Bayesian Backpropagation Over I-O Functions Rather Than Weights , 1993, NIPS.

[24]  Leonard Pitt,et al.  On the Necessity of Occam Algorithms , 1992, Theor. Comput. Sci..

[25]  Samarth Swarup,et al.  Unifying evolutionary and network dynamics. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..