A Provably Efficient Algorithm for Training Deep Networks

We consider deep neural networks (formally equivalent to sum-product networks [19]), in which the output of each node is a quadratic function of its inputs. Similar to other deep architectures, these networks can compactly represent any function on a finite training set. The main goal of this paper is the derivation of a provably efficient, layer-by-layer, algorithm for training such networks, which we denote as the Basis Learner. Unlike most, if not all, previous algorithms for training deep neural networks, our algorithm comes with formal polynomial time convergence guarantees. Moreover, the algorithm is a universal learner in the sense that the training error is guaranteed to decrease at every iteration, and can eventually reach zero under mild conditions. We present practical implementations of this algorithm, as well as preliminary but quite promising experimental results. We also compare our deep architecture to other shallow architectures for learning polynomials, in particular kernel learning.

[1]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[2]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[4]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[7]  Martin Kreuzer,et al.  Computing Ideals of Points , 2000, J. Symb. Comput..

[8]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[9]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[11]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[12]  Yoshua Bengio,et al.  An empirical evaluation of deep architectures on problems with many factors of variation , 2007, ICML '07.

[13]  Roi Livni,et al.  Vanishing Component Analysis , 2013, ICML.

[14]  Tomas Sauer,et al.  Polynomial interpolation in several variables , 2000, Adv. Comput. Math..

[15]  Shai Ben-David,et al.  Localization vs. Identification of Semi-Algebraic Sets , 1993, COLT '93.

[16]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[17]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[18]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[19]  Maria Grazia Marinari,et al.  Gröbner bases of ideals defined by functionals with an application to ideals of projective points , 1993, Applicable Algebra in Engineering, Communication and Computing.

[20]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[21]  Sheng Chen,et al.  Orthogonal least squares methods and their application to non-linear system identification , 1989 .

[22]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..