Size and Depth Separation in Approximating Benign Functions with Neural Networks

When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomiallybounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these conditions “benign”, and explore the benefits of size and depth for approximation of benign functions with ReLU networks. As we show, this problem is more challenging than the corresponding problem for non-benign functions. We give complexity-theoretic barriers to showing depth-lowerbounds: Proving existence of a benign function that cannot be approximated by polynomial-sized networks of depth 4 would settle longstanding open problems in computational complexity. It implies that beyond depth 4 there is a barrier to showing depth-separation for benign functions, even between networks of constant depth and networks of nonconstant depth. We also study sizeseparation, namely, whether there are benign functions that can be approximated with networks of size O(s(d)), but not with networks of size O(s′(d)). We show a complexity-theoretic barrier to proving such results beyond size O(d log(d)), but also show an explicit benign function, that can be approximated with networks of size O(d) and not with networks of size o(d/ log d). For approximation in the L∞ sense we achieve such separation already between size O(d) and size o(d). Moreover, we show superpolynomial size lower bounds and barriers to such lower bounds, depending on the assumptions on the function. Our size-separation results rely on an analysis of size lower bounds for Boolean functions, which is of independent interest: We show linear size lower bounds for computing explicit Boolean functions (such as set disjointness) with neural networks and threshold circuits.

[1]  Alexander A. Razborov,et al.  On the Distributional Complexity of Disjointness , 1992, Theor. Comput. Sci..

[2]  Lijie Chen,et al.  Toward Super-Polynomial Size Lower Bounds for Depth-Two Threshold Circuits , 2018, ArXiv.

[3]  Alon Orlitsky,et al.  Lower bounds on threshold and related circuits via communication complexity , 1994, IEEE Trans. Inf. Theory.

[4]  Moni Naor,et al.  Number-theoretic constructions of efficient pseudo-random functions , 2004, JACM.

[5]  Pascal Koiran VC dimension in circuit complexity , 1996, Proceedings of Computational Complexity (Formerly Structure in Complexity Theory).

[6]  Matus Telgarsky,et al.  Benefits of Depth in Neural Networks , 2016, COLT.

[7]  Alexander A. Razborov,et al.  Majority gates vs. general weighted threshold gates , 1992, [1992] Proceedings of the Seventh Annual Structure in Complexity Theory Conference.

[8]  Daniel M. Kane,et al.  Super-linear gate and super-quadratic wire lower bounds for depth-two and depth-three threshold circuits , 2015, STOC.

[9]  Ohad Shamir,et al.  The Power of Depth for Feedforward Neural Networks , 2015, COLT.

[10]  N. Nisan The communication complexity of threshold gates , 1993 .

[11]  Ohad Shamir,et al.  Depth Separations in Neural Networks: What is Actually Being Separated? , 2019, Constructive Approximation.

[12]  Sanjeev Arora,et al.  Computational Complexity: A Modern Approach , 2009 .

[13]  Georg Schnitger,et al.  On the computational power of sigmoid versus Boolean threshold circuits , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[14]  Dmitry Yarotsky,et al.  Error bounds for approximations with deep ReLU networks , 2016, Neural Networks.

[15]  Gilad Yehudai,et al.  The Connection Between Approximation, Depth Separation and Learnability in Neural Networks , 2021, COLT.

[16]  Dmitry Yarotsky,et al.  The phase diagram of approximation rates for deep neural networks , 2019, NeurIPS.

[17]  Jan Krajícek Interpolation by a Game , 1998, Math. Log. Q..

[18]  Ohad Shamir,et al.  Depth-Width Tradeoffs in Approximating Natural Functions with Neural Networks , 2016, ICML.

[19]  Ziv Bar-Yossef,et al.  An information statistics approach to data stream and communication complexity , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[20]  Edward A. Hirsch,et al.  A Better-Than-3n Lower Bound for the Circuit Complexity of an Explicit Function , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[21]  Ohad Shamir,et al.  Neural Networks with Small Weights and Depth-Separation Barriers , 2020, Electron. Colloquium Comput. Complex..

[22]  Richard Ryan Williams,et al.  Limits on representing Boolean functions by linear combinations of simple functions: thresholds, ReLUs, and low-degree polynomials , 2018, CCC.

[23]  Dmitry Yarotsky,et al.  Optimal approximation of continuous functions by very deep ReLU networks , 2018, COLT.

[24]  Zuowei Shen,et al.  Deep Network Approximation Characterized by Number of Neurons , 2019, Communications in Computational Physics.

[25]  R. Srikant,et al.  Why Deep Neural Networks for Function Approximation? , 2016, ICLR.

[26]  Gitta Kutyniok,et al.  Expressivity of Deep Neural Networks , 2020, ArXiv.

[27]  Shachar Lovett,et al.  Equality alone does not simulate randomness , 2018, Electron. Colloquium Comput. Complex..

[28]  Zuowei Shen,et al.  Deep Network Approximation for Smooth Functions , 2020, ArXiv.

[29]  Stefan Lucks,et al.  Pseudorandom functions in $ \textit{TC}^{0} $ and cryptographic limitations to proving lower bounds , 2001, computational complexity.

[30]  Guy Bresler,et al.  Sharp Representation Theorems for ReLU Networks with Precise Dependence on Depth , 2020, NeurIPS.

[31]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[32]  A. Razborov Communication Complexity , 2011 .

[33]  Amitabh Basu,et al.  Lower bounds over Boolean inputs for deep neural networks with ReLU gates , 2017, Electron. Colloquium Comput. Complex..

[34]  György Turán,et al.  A Liniear lower bound for the size of threshold circuits , 1993, Bull. EATCS.

[35]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[36]  Wolfgang Maass,et al.  Bounds for the computational power and learning complexity of analog neural nets , 1993, SIAM J. Comput..

[37]  I. Oliveira Unconditional Lower Bounds in Complexity Theory , 2015 .

[38]  Amit Daniely,et al.  Depth Separation for Neural Networks , 2017, COLT.

[39]  R. Zemel,et al.  On the Representational Efficiency of Restricted Boltzmann Machines , 2013, NIPS 2013.

[40]  Andrew R. Barron,et al.  Approximation and estimation bounds for artificial neural networks , 2004, Machine Learning.

[41]  Stasys Jukna,et al.  Boolean Function Complexity Advances and Frontiers , 2012, Bull. EATCS.

[42]  Marc Vinyals,et al.  How Limited Interaction Hinders Real Communication (and What It Means for Proof and Circuit Complexity) , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[43]  Marek Karpinski,et al.  Simulating threshold circuits by majority circuits , 1993, SIAM J. Comput..

[44]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[45]  Bala Kalyanasundaram,et al.  The Probabilistic Communication Complexity of Set Intersection , 1992, SIAM J. Discret. Math..

[46]  Peter Frankl,et al.  Complexity classes in communication complexity theory , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[47]  Alexander A. Razborov,et al.  On Small Depth Threshold Circuits , 1992, SWAT.

[48]  Philipp Petersen,et al.  Optimal approximation of piecewise smooth functions using deep ReLU neural networks , 2017, Neural Networks.