On the Expressive Power of Deep Polynomial Neural Networks

We study deep neural networks with polynomial activations, particularly their expressive power. For a fixed architecture and activation degree, a polynomial neural network defines an algebraic map from weights to polynomials. The image of this map is the functional space associated to the network, and it is an irreducible algebraic variety upon taking closure. This paper proposes the dimension of this variety as a precise measure of the expressive power of polynomial neural networks. We obtain several theoretical results regarding this dimension as a function of architecture, including an exact formula for high activation degrees, as well as upper and lower bounds on layer widths in order for deep polynomials networks to fill the ambient functional space. We also present computational evidence that it is profitable in terms of expressiveness for layer widths to increase monotonically and then decrease monotonically. Finally, we link our study to favorable optimization properties when training weights, and we draw intriguing connections with tensor and polynomial decompositions.

[1]  T. Willmore Algebraic Geometry , 1973, Nature.

[2]  J. Landsberg Tensors: Geometry and Applications , 2011 .

[3]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[4]  Wei Hu,et al.  A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks , 2018, ICLR.

[5]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[6]  Lisa Nicklasson,et al.  On the Hilbert series of ideals generated by generic forms , 2015, 1502.06762.

[7]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[8]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[9]  James Martens,et al.  On the Expressive Efficiency of Sum Product Networks , 2014, ArXiv.

[10]  Amnon Shashua,et al.  Convolutional Rectifier Networks as Generalized Tensor Decompositions , 2016, ICML.

[11]  Luke Oeding,et al.  Learning Algebraic Models of Quantum Entanglement , 2019, Quantum Inf. Process..

[12]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[13]  C. D. Boor,et al.  Polynomial interpolation in several variables , 1994 .

[14]  Joe W. Harris,et al.  Algebraic Geometry: A First Course , 1995 .

[15]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[16]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.

[17]  Nadav Cohen,et al.  On the Expressive Power of Deep Learning: A Tensor Analysis , 2015, COLT 2016.

[18]  Tomas Sauer,et al.  Polynomial interpolation in several variables , 2000, Adv. Comput. Math..

[19]  Edgar E. Enochs,et al.  On Cohen-Macaulay rings , 1994 .

[20]  Joan Bruna,et al.  Spurious Valleys in Two-layer Neural Network Optimization Landscapes , 2018, 1802.06384.

[21]  D. Eisenbud Commutative Algebra: with a View Toward Algebraic Geometry , 1995 .

[22]  Jason D. Lee,et al.  On the Power of Over-parametrization in Neural Networks with Quadratic Activation , 2018, ICML.

[23]  Joan Bruna,et al.  Pure and Spurious Critical Points: a Geometric Study of Linear Networks , 2020, ICLR.

[24]  Joan Bruna,et al.  Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys , 2018, ArXiv.

[25]  Tingting Tang,et al.  The Loss Surface of Deep Linear Networks Viewed Through the Algebraic Geometry Lens , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Giorgio Ottaviani,et al.  On the Waring problem for polynomial rings , 2011, Proceedings of the National Academy of Sciences.

[27]  Sanjeev Arora,et al.  On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization , 2018, ICML.

[28]  Rekha R. Thomas,et al.  The Euclidean Distance Degree of an Algebraic Variety , 2013, Foundations of Computational Mathematics.

[29]  Samuel Lundqvist,et al.  On generic and maximal k-ranks of binary forms , 2017, Journal of Pure and Applied Algebra.

[30]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[31]  Andrea Montanari,et al.  A mean field view of the landscape of two-layer neural networks , 2018, Proceedings of the National Academy of Sciences.

[32]  Zach Teitler,et al.  On maximum, typical and generic ranks , 2014, ArXiv.