Approximation and Learning of Convex Superpositions

We present a fairly general method for constructing classes of functions of finite scale-sensitive dimension (the scale-sensitive dimension is a generalization of the Vapnik?Chervonenkis dimension to real-valued functions). The construction is as follows: start from a classFof functions of finite VC dimension, take the convex hull coFofF, and then take the closurecoFof coFin an appropriate sense. As an example, we study in more detail the case whereFis the class of threshold functions. It is shown thatcoFincludes two important classes of functions: ?neural networks with one hidden layer and bounded output weights; ?the so-called?class of Barron, which was shown to satisfy a number of interesting approximation and closure properties. We also give an integral representation in the form of a “continuous neural network” which generalizes Barron's. It is shown that the existence of an integral representation is equivalent to bothL2andL∞approximability. A preliminary version of this paper was presented at EuroCOLT'95. The main difference with the conference version is the addition of Theorem 7, where we show that a key topological result fails when the VC dimension hypothesis is removed.

[1]  P. Halmos Lectures on ergodic theory , 1956 .

[2]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[3]  D. Pollard Convergence of stochastic processes , 1984 .

[4]  D. Pollard,et al.  $U$-Processes: Rates of Convergence , 1987 .

[5]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[6]  David Haussler,et al.  What Size Net Gives Valid Generalization? , 1989, Neural Computation.

[7]  R. M. Dudley,et al.  Real Analysis and Probability , 1989 .

[8]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[9]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[10]  Michael C. Laskowski,et al.  Vapnik-Chervonenkis classes of definable sets , 1992 .

[11]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[12]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[13]  Jirí Matousek,et al.  Discrepancy and approximations for bounded VC-dimension , 1993, Comb..

[14]  Noga Alon,et al.  Scale-sensitive dimensions, uniform convergence, and learnability , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[15]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[16]  Philip M. Long,et al.  Fat-shattering and the learnability of real-valued functions , 1994, COLT '94.