Improved Rates and Asymptotic Normality for Nonparametric Neural Network Estimators

We obtain an improved approximation rate (in Sobolev norm) of r/sup -1/2-/spl alpha//(d+1)/ for a large class of single hidden layer feedforward artificial neural networks (ANN) with r hidden units and possibly nonsigmoid activation functions when the target function satisfies certain smoothness conditions. Here, d is the dimension of the domain of the target function, and /spl alpha//spl isin/(0, 1) is related to the smoothness of the activation function. When applying this class of ANNs to nonparametrically estimate (train) a general target function using the method of sieves, we obtain new root-mean-square convergence rates of Op([n/log(n)]/sup -/(1+2/spl alpha//(d+1))/[4(1+/spl alpha//(d+1))])=op(n/sup -1/4/) by letting the number of hidden units /spl tau//sub n/, increase appropriately with the sample size (number of training examples) n. These rates are valid for i.i.d. data as well as for uniform mixing and absolutely regular (/spl beta/-mixing) stationary time series data. In addition, the rates are fast enough to deliver root-n asymptotic normality for plug-in estimates of smooth functionals using general ANN sieve estimators. As interesting applications to nonlinear time series, we establish rates for ANN sieve estimators of four different multivariate target functions: a conditional mean, a conditional quantile, a joint density, and a conditional density. We also obtain root-n asymptotic normality results for semiparametric model coefficient and average derivative estimators.

[1]  G. Pisier Remarques sur un résultat non publié de B. Maurey , 1981 .

[2]  Herold Dehling,et al.  Limit theorems for sums of weakly dependent Banach space valued random variables , 1983 .

[3]  Halbert White,et al.  Learning in Artificial Neural Networks: A Statistical Perspective , 1989, Neural Computation.

[4]  C. J. Stone,et al.  Large-Sample Inference for Log-Spline Models , 1990 .

[5]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[6]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[7]  A. Barron,et al.  APPROXIMATION OF DENSITY FUNCTIONS BY SEQUENCES OF EXPONENTIAL FAMILIES , 1991 .

[8]  L. Goldstein,et al.  Optimal Plug-in Estimators for Nonparametric Functional Estimation , 1992 .

[9]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[10]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[11]  P. Doukhan Mixing: Properties and Examples , 1994 .

[12]  Kurt Hornik,et al.  Degree of Approximation Results for Feedforward Networks Approximating Unknown Mappings and Their Derivatives , 1994, Neural Computation.

[13]  E. Masry,et al.  Minimum complexity regression estimation with weakly dependent observations , 1996, Proceedings of 1994 Workshop on Information Theory and Statistics.

[14]  W. Wong,et al.  Convergence Rate of Sieve Estimates , 1994 .

[15]  W. Wong,et al.  Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .

[16]  Y. Makovoz Random Approximants and Neural Networks , 1996 .

[17]  Dharmendra S. Modha,et al.  Rate of Convergence in Density Estimation Using Neural Networks , 1996, Neural Computation.

[18]  Xiaotong Shen,et al.  On methods of sieves and penalization , 1997 .

[19]  Xiaotong Shen,et al.  Sieve extremum estimates for weakly dependent data , 1998 .

[20]  Maxwell B. Stinchcombe,et al.  CONSISTENT SPECIFICATION TESTING WITH NUISANCE PARAMETERS PRESENT ONLY UNDER THE ALTERNATIVE , 1998, Econometric Theory.