Nonparametric estimation via empirical risk minimization

A general notion of universal consistency of nonparametric estimators is introduced that applies to regression estimation, conditional median estimation, curve fitting, pattern recognition, and learning concepts. General methods for proving consistency of estimators based on minimizing the empirical error are shown. In particular, distribution-free almost sure consistency of neural network estimates and generalized linear estimators is established. >

[1]  J. Lamperti ON CONVERGENCE OF STOCHASTIC PROCESSES , 1962 .

[2]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[3]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[4]  S. Schwartz Estimation of Probability Density by an Orthogonal Series , 1967 .

[5]  R. Kronmal,et al.  The Estimation of Probability Densities and Cumulatives by Fourier Series Methods , 1968 .

[6]  F. Downton Stochastic Approximation , 1969, Nature.

[7]  T. Wagner,et al.  Asymptotically optimal discriminant functions for pattern classification , 1969, IEEE Trans. Inf. Theory.

[8]  M. T. Wasan Stochastic Approximation , 1969 .

[9]  R. Kronmal,et al.  On Multivariate Density Estimates Based on Orthogonal Expansions , 1970 .

[10]  D. Specht Series Estimation of a Probability Density Function , 1971 .

[11]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[12]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory A.

[13]  N. Glick Sample-Based Multinomial Classification , 1973 .

[14]  L. Devroye Nonparametric Discrimination and Density Estimation. , 1976 .

[15]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[16]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[17]  L. Devroye,et al.  Distribution-Free Consistency Results in Nonparametric Discrimination and Regression Function Estimation , 1980 .

[18]  C. Spiegelman,et al.  Consistent Window Estimation in Nonparametric Regression , 1980 .

[19]  T. M. Williams,et al.  Optimizing Methods in Statistics , 1981 .

[20]  Wlodzimierz Greblicki Asymptotic efficiency of classifying procedures using the Hermite series estimate of multivariate probability densities , 1981, IEEE Trans. Inf. Theory.

[21]  V. Vapnik,et al.  Necessary and Sufficient Conditions for the Uniform Convergence of Means to their Expectations , 1982 .

[22]  S. Geman,et al.  Nonparametric Maximum Likelihood Estimation by the Method of Sieves , 1982 .

[23]  Miroslaw Pawlak,et al.  A classification procedure using the multiple Fourier series , 1982, Inf. Sci..

[24]  Luc Devroye,et al.  Any Discrimination Rule Can Have an Arbitrarily Bad Probability of Error for Finite Sample Size , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Miroslaw Pawlak,et al.  Almost sure convergence of classification procedures using Hermite series density estimates , 1983, Pattern Recognit. Lett..

[26]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[27]  László Györfi,et al.  Adaptive linear procedures under general conditions , 1984, IEEE Trans. Inf. Theory.

[28]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[29]  D. Pollard,et al.  $U$-Processes: Rates of Convergence , 1987 .

[30]  R. Dudley Universal Donsker Classes and Metric Entropy , 1987 .

[31]  D. Cox Approximation of Least Squares Regression on Nested Subspaces , 1988 .

[32]  S. Geer Estimating a Regression Function , 1990 .

[33]  Luc Devroye,et al.  Automatic Pattern Recognition: A Study of the Probability of Error , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  H. White Some Asymptotic Results for Learning in Single Hidden-Layer Feedforward Network Models , 1989 .

[35]  Ken-ichi Funahashi,et al.  On the approximate realization of continuous mappings by neural networks , 1989, Neural Networks.

[36]  A. Barron,et al.  Statistical properties of artificial neural networks , 1989, Proceedings of the 28th IEEE Conference on Decision and Control,.

[37]  L. Devroye,et al.  An equivalence theorem for L1 convergence of the kernel regression estimate , 1989 .

[38]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[39]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[40]  W. Härdle Applied Nonparametric Regression , 1991 .

[41]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[42]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[43]  Andrew R. Barron,et al.  Complexity Regularization with Application to Artificial Neural Networks , 1991 .

[44]  A. Barron Approximation and Estimation Bounds for Artificial Neural Networks , 1991, COLT '91.

[45]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[46]  L. Györfi Universal Consistencies of a Regression Estimate for Unbounded Regression Functions , 1991 .

[47]  Andrew R. Barron,et al.  Minimum complexity density estimation , 1991, IEEE Trans. Inf. Theory.

[48]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1992, Math. Control. Signals Syst..

[49]  W. Härdle Applied Nonparametric Regression , 1992 .

[50]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[51]  H. White Nonparametric Estimation of Conditional Quantiles Using Neural Networks , 1990 .

[52]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[53]  Jan Mielniczuk,et al.  Consistency of multilayer perceptron regression estimators , 1993, Neural Networks.

[54]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[55]  G. Lugosi,et al.  Strong Universal Consistency of Neural Network Classifiers , 1993, Proceedings. IEEE International Symposium on Information Theory.

[56]  P. Massart,et al.  Rates of convergence for minimum contrast estimators , 1993 .

[57]  Kurt Hornik,et al.  Some new results on neural network approximation , 1993, Neural Networks.

[58]  András Faragó,et al.  Strong universal consistency of neural network classifiers , 1993, IEEE Trans. Inf. Theory.

[59]  G. Lugosi,et al.  On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[60]  W. Wong,et al.  Convergence Rate of Sieve Estimates , 1994 .

[61]  W. Wong,et al.  Probability inequalities for likelihood ratios and convergence rates of sieve MLEs , 1995 .

[62]  P. R. Kumar,et al.  Learning by canonical smooth estimation. II. Learning and choice of model complexity , 1996, IEEE Trans. Autom. Control..