Deviation Theorems for Solutions of Differential Equations and Applications to Lower Bounds on Parallel Complexity of Sigmoids

Abstract Under a sigmoid with a depth d we understand a circuit with d layers where each real function computed at (i+1)th layer is obtained as G(q) where q is a rational expression in the functions computed at ith layer and G is a gate operator from some admitted family. Two types of the families of gate operators are considered: first, we admit to substitute g(q) where g is a solution of a linear ordinary differential equation with the polynomial coefficients and second, as G(q) we take a solution of nonlinear first-order differential equation. The sigmoids of the first type compute any composition of the functions like exp, log, sin (thus, it includes, in particular, standard sigmoids corresponding to the gate g=(1+exp(−x))−1), the sigmoids of the second type compute Pfaffian functions. The main result states that if two different functions f1, f2 are computed by means of the sigmoids with the parallel complexity d, then the difference |f1−f2| grows not slower than (exp(d)(p))−1 (and not faster than exp(d)(p)) where exp(d) is d times iteration of the exponential function and p is a certain polynomial, thus one cannot rather good approximate f1 with a precise parallel complexity d by means of a function f2 with a less parallel complexity. Also we estimate the number of zeroes in the intervals of a function computed by a sigmoid of the first type. All the obtained bounds are sharp.