论文信息 - In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

[1] Eitan M. Gurari,et al. Introduction to the theory of computation , 1989 .

[2] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[3] Leslie G. Valiant,et al. Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[4] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[5] Stephen P. Boyd,et al. A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[6] Tommi S. Jaakkola,et al. Weighted Low-Rank Approximations , 2003, ICML.

[7] Nathan Srebro,et al. Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[8] Nicolas Le Roux,et al. Convex Neural Networks , 2005, NIPS.

[9] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, FOCS.

[10] Samuel Burer,et al. Computational enhancements in low-rank semidefinite programming , 2006, Optim. Methods Softw..

[11] Alexander A. Sherstov,et al. Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[12] Nathan Srebro,et al. ` 1 Regularization in Infinite Dimensional Feature Spaces , 2007 .

[13] Lawrence K. Saul,et al. Kernel Methods for Deep Learning , 2009, NIPS.

[14] Ruslan Salakhutdinov,et al. Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm , 2010, NIPS.

[15] Roi Livni,et al. On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[16] Nathan Linial,et al. From average case complexity to improper learning complexity , 2013, STOC.

[17] Brian Kingsbury,et al. How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets , 2014, ArXiv.

[18] Francis R. Bach,et al. Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..