In Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning

We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.

[1]  Eitan M. Gurari,et al.  Introduction to the theory of computation , 1989 .

[2]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[3]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[4]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[5]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[6]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[7]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[8]  Nicolas Le Roux,et al.  Convex Neural Networks , 2005, NIPS.

[9]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, FOCS.

[10]  Samuel Burer,et al.  Computational enhancements in low-rank semidefinite programming , 2006, Optim. Methods Softw..

[11]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[12]  Nathan Srebro,et al.  ` 1 Regularization in Infinite Dimensional Feature Spaces , 2007 .

[13]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[14]  Ruslan Salakhutdinov,et al.  Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm , 2010, NIPS.

[15]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[16]  Nathan Linial,et al.  From average case complexity to improper learning complexity , 2013, STOC.

[17]  Brian Kingsbury,et al.  How to Scale Up Kernel Methods to Be As Good As Deep Neural Nets , 2014, ArXiv.

[18]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..