A Distribution-Free Theory of Nonparametric Regression

Why is Nonparametric Regression Important? * How to Construct Nonparametric Regression Estimates * Lower Bounds * Partitioning Estimates * Kernel Estimates * k-NN Estimates * Splitting the Sample * Cross Validation * Uniform Laws of Large Numbers * Least Squares Estimates I: Consistency * Least Squares Estimates II: Rate of Convergence * Least Squares Estimates III: Complexity Regularization * Consistency of Data-Dependent Partitioning Estimates * Univariate Least Squares Spline Estimates * Multivariate Least Squares Spline Estimates * Neural Networks Estimates * Radial Basis Function Networks * Orthogonal Series Estimates * Advanced Techniques from Empirical Process Theory * Penalized Least Squares Estimates I: Consistency * Penalized Least Squares Estimates II: Rate of Convergence * Dimension Reduction Techniques * Strong Consistency of Local Averaging Estimates * Semi-Recursive Estimates * Recursive Estimates * Censored Observations * Dependent Observations

[1]  Edmund Taylor Whittaker On a New Method of Graduation , 1922, Proceedings of the Edinburgh Mathematical Society.

[2]  Philip T. Maker The ergodic theorem for a sequence of functions , 1940 .

[3]  J. Tukey Non-Parametric Estimation II. Statistically Equivalent Blocks and Tolerance Regions--The Continuous Case , 1947 .

[4]  M. Rosenblatt Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[5]  T. Broadbent Measure and Integral , 1957, Nature.

[6]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[7]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[8]  J. Tukey Curves As Parameters, and Touch Estimation , 1961 .

[9]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[10]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[11]  E. Nadaraya On Estimating Regression , 1964 .

[12]  G. S. Watson,et al.  Smooth regression analysis , 1964 .

[13]  W. Rudin Principles of mathematical analysis , 1964 .

[14]  N. Meyers,et al.  H = W. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[15]  I. J. Schoenberg,et al.  SPLINE FUNCTIONS AND THE PROBLEM OF GRADUATION , 1964 .

[16]  C. Quesenberry,et al.  A nonparametric estimate of a multivariate density function , 1965 .

[17]  Nils J. Nilsson,et al.  Learning Machines: Foundations of Trainable Pattern-Classifying Systems , 1965 .

[18]  Calyampudi R. Rao,et al.  Linear statistical inference and its applications , 1965 .

[19]  W. Rudin Real and complex analysis , 1968 .

[20]  C. Reinsch Smoothing by spline functions , 1967 .

[21]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[22]  Shun-ichi Amari,et al.  A Theory of Pattern Recognition , 1968 .

[23]  J. V. Ryzin,et al.  ON STRONG CONSISTENCY OF DENSITY ESTIMATES , 1969 .

[24]  T. Wagner,et al.  Asymptotically optimal discriminant functions for pattern classification , 1969, IEEE Trans. Inf. Theory.

[25]  Charles T. Wolverton,et al.  Recursive Estimates of Probability Densities , 1969, IEEE Transactions on Systems Science and Cybernetics.

[26]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[27]  E. Nadaraya Remarks on Non-Parametric Estimates for Density Functions and Regression Curves , 1970 .

[28]  H. Robbins,et al.  A CONVERGENCE THEOREM FOR NON NEGATIVE ALMOST SUPERMARTINGALES AND SOME APPLICATIONS**Research supported by NIH Grant 5-R01-GM-16895-03 and ONR Grant N00014-67-A-0108-0018. , 1971 .

[29]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[30]  E. Stein,et al.  Introduction to Fourier Analysis on Euclidean Spaces. , 1971 .

[31]  Hajime Yamato,et al.  SEQUENTIAL ESTIMATION OF A CONTINUOUS PROBABILITY DENSITY FUNCTION AND MODE , 1971 .

[32]  Norbert Sauer,et al.  On the Density of Families of Sets , 1972, J. Comb. Theory, Ser. A.

[33]  S. Shelah A combinatorial problem; stability and order for models and theories in infinitary languages. , 1972 .

[34]  J. Cooper SINGULAR INTEGRALS AND DIFFERENTIABILITY PROPERTIES OF FUNCTIONS , 1973 .

[35]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[36]  W. Stout Almost sure convergence , 1974 .

[37]  D. Ornstein Ergodic theory, randomness, and dynamical systems , 1974 .

[38]  G. Wahba,et al.  A completely automatic french curve: fitting spline functions by cross validation , 1975 .

[39]  R. Shibata Selection of the order of an autoregressive model by Akaike's information criterion , 1976 .

[40]  R. Has’minskiĭ,et al.  Stochastic Approximation and Recursive Estimation , 1976 .

[41]  C. J. Stone,et al.  Consistent Nonparametric Regression , 1977 .

[42]  A. V. Peterson Expressing the Kaplan-Meier estimator as a function of empirical subsurvival functions , 1977 .

[43]  George A. F. Seber,et al.  Linear regression analysis , 1977 .

[44]  D. Ornstein Guessing the next output of a stationary process , 1978 .

[45]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[46]  C. Spiegelman,et al.  Consistent Window Estimation in Nonparametric Regression , 1980 .

[47]  C. J. Stone,et al.  Optimal Rates of Convergence for Nonparametric Estimators , 1980 .

[48]  R. Shibata An optimal selection of regression variables , 1981 .

[49]  Rupert G. Miller,et al.  Survival Analysis , 2022, The SAGE Encyclopedia of Research Design.

[50]  Y. Mack,et al.  Local Properties of k-NN Regression Estimates , 1981 .

[51]  L. Schumaker Spline Functions: Basic Theory , 1981 .

[52]  C. J. Stone,et al.  Optimal Global Rates of Convergence for Nonparametric Regression , 1982 .

[53]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[54]  A. Krzyżak,et al.  Almost Everywhere Convergence of Recursive Kernel Regression Function Estimates , 1982 .

[55]  Prakasa Rao Nonparametric functional estimation , 1983 .

[56]  W. Wong On the Consistency of Cross-Validation in Kernel Nonparametric Regression , 1983 .

[57]  R. Varga,et al.  Proof of Theorem 4 , 1983 .

[58]  I. W. Wright Splines in Statistics , 1983 .

[59]  M. Rosenblatt,et al.  Smoothing Splines: Regression, Derivatives and Deconvolution , 1983 .

[60]  D. Pollard Convergence of stochastic processes , 1984 .

[61]  Adam Krzyzak,et al.  Distribution-free consistency of a nonparametric kernel regression estimate and classification , 1984, IEEE Trans. Inf. Theory.

[62]  J. V. Ryzin,et al.  A Buckley-James-type estimator for the mean with censored data , 1984 .

[63]  W. Stute Asymptotic Normality of Nearest Neighbor Regression Function Estimates , 1984 .

[64]  Ker-Chau Li Consistency for Cross-Validated Nearest Neighbor Estimates in Nonparametric Regression , 1984 .

[65]  C. J. Stone,et al.  Additive Regression and Other Nonparametric Models , 1985 .

[66]  P. Speckman Spline Smoothing and Optimal Rates of Convergence in Nonparametric Regression Models , 1985 .

[67]  S. Geer A New Approach to Least-Squares Estimation, with Applications , 1986 .

[68]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[69]  Ker-Chau Li,et al.  Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing , 1986 .

[70]  J. Steele An Efron-Stein inequality for nonsymmetric statistics , 1986 .

[71]  Adam Krzyzak,et al.  The rates of convergence of kernel regression estimates and classification rules , 1986, IEEE Trans. Inf. Theory.

[72]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[73]  J. Wellner,et al.  Empirical Processes with Applications to Statistics , 2009 .

[74]  L. Zhao Exponential bounds of mean error for the nearest neighbor estimates of regression functions*1 , 1987 .

[75]  R. Tapia,et al.  Nonparametric Function Estimation, Modeling, and Simulation , 1987 .

[76]  D. Pollard,et al.  $U$-Processes: Rates of Convergence , 1987 .

[77]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[78]  Ewaryst Rafaj⌈owicz Nonparametric orthogonal series estimators of regression: A class attaining the optimal convergence rate in L2☆ , 1987 .

[79]  P. Robinson Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form , 1987 .

[80]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[81]  P. Robinson ROOT-N-CONSISTENT SEMIPARAMETRIC REGRESSION , 1988 .

[82]  C. Manski Identification of Binary Response Models , 1988 .

[83]  S. Geer Estimating a Regression Function , 1990 .

[84]  D. W. Scott,et al.  Nonparametric Estimation of Probability Densities and Regression Curves , 1988 .

[85]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[86]  Harro Walk,et al.  Convergence of the Robbins-Monro method for linear problems in a Banach space , 1989 .

[87]  Thomas M. Stoker,et al.  Semiparametric Estimation of Index Coefficients , 1989 .

[88]  Colin McDiarmid,et al.  Surveys in Combinatorics, 1989: On the method of bounded differences , 1989 .

[89]  D. Pollard Asymptotics via Empirical Processes , 1989 .

[90]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[91]  P. Massart The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality , 1990 .

[92]  Halbert White,et al.  Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings , 1990, Neural Networks.

[93]  Grace Wahba,et al.  Spline Models for Observational Data , 1990 .

[94]  D. Pollard Empirical Processes: Theory and Applications , 1990 .

[95]  Adam Krzyzak,et al.  On estimation of a class of nonlinear systems by the kernel regression estimate , 1990, IEEE Trans. Inf. Theory.

[96]  Jooyoung Park,et al.  Universal Approximation Using Radial-Basis-Function Networks , 1991, Neural Computation.

[97]  Ping Zhang Variable Selection in Nonparametric Regression with Continuous Covariates , 1991 .

[98]  Adam Krzyzak,et al.  On exponential bounds on the Bayes risk of the kernel classification rule , 1991, IEEE Trans. Inf. Theory.

[99]  M. Pawlak On the almost everywhere properties of the kernel regression estimate , 1991 .

[100]  P. Shields,et al.  Cutting And Stacking. A Method For Constructing Stationary Processes , 1991, Proceedings. 1991 IEEE International Symposium on Information Theory.

[101]  Boris Polyak,et al.  Asymptotic Optimality of the $C_p$-Test for the Orthogonal Series Estimation of Regression , 1991 .

[102]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[103]  W. Light Some Aspects of Radial Basis Function Approximation , 1992 .

[104]  Adam Krzyzak,et al.  Global convergence of the recursive kernel regression estimates with applications in classification and nonlinear system estimation , 1992, IEEE Trans. Inf. Theory.

[105]  Lixing Zhu,et al.  A note on the consistent estimator of nonparametric regression constructed by splines , 1992 .

[106]  W. Newey,et al.  Kernel Estimation of Partial Means and a General Variance Estimator , 1994, Econometric Theory.

[107]  G. Pflug,et al.  Stochastic approximation and optimization of random systems , 1992 .

[108]  H. White Nonparametric Estimation of Conditional Quantiles Using Neural Networks , 1990 .

[109]  Jan Mielniczuk,et al.  Consistency of multilayer perceptron regression estimators , 1993, Neural Networks.

[110]  Jooyoung Park,et al.  Approximation and Radial-Basis-Function Networks , 1993, Neural Computation.

[111]  S. Yakowitz Nearest neighbor regression estimation for null-recurrent Markov time series , 1993 .

[112]  A. Samarov Exploring Regression Structure Using Nonparametric Functional Estimation , 1993 .

[113]  Erkki Oja,et al.  Rival penalized competitive learning for clustering analysis, RBF net, and curve detection , 1993, IEEE Trans. Neural Networks.

[114]  Adam Krzy.zak Identification of nonlinear block-oriented systems by the recursive kernel estimate , 1993 .

[115]  W. Stute,et al.  The strong law under random censorship , 1993 .

[116]  A. Tsybakov,et al.  Minimax theory of image reconstruction , 1993 .

[117]  Michael H. Neumann,et al.  On the Efficiency of Wavelet Estimators under Arbitrary Error Distributions , 1994 .

[118]  J. Rodriguez,et al.  Problem (1) , 1994 .

[119]  Adam Krzyzak,et al.  On radial basis function nets and kernel regression: Statistical consistency, convergence rates, and receptive field size , 1994, Neural Networks.

[120]  Daniel F. McCaffrey,et al.  Convergence rates for single hidden layer feedforward networks , 1994, Neural Networks.

[121]  C. J. Stone,et al.  The Use of Polynomial Splines and Their Tensor Products in Multivariate Function Estimation , 1994 .

[122]  M. Victor Wickerhauser,et al.  Wavelets: Algorithms and Applications (Yves Meyer) , 1994, SIAM Rev..

[123]  J. Rodriguez,et al.  Problem (2) , 1994 .

[124]  M. Talagrand Sharper Bounds for Gaussian and Empirical Processes , 1994 .

[125]  W. Wong,et al.  Convergence Rate of Sieve Estimates , 1994 .

[126]  Gábor Lugosi,et al.  Nonparametric estimation via empirical risk minimization , 1995, IEEE Trans. Inf. Theory.

[127]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[128]  G. Wahba,et al.  Smoothing spline ANOVA for exponential families, with application to the Wisconsin Epidemiological Study of Diabetic Retinopathy : the 1994 Neyman Memorial Lecture , 1995 .

[129]  Matthew P. Wand,et al.  Kernel Smoothing , 1995 .

[130]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[131]  O. Linton,et al.  A kernel method of estimating structured nonparametric regression based on marginal integration , 1995 .

[132]  H. N. Mhaskar,et al.  Neural Networks for Optimal Approximation of Smooth and Analytic Functions , 1996, Neural Computation.

[133]  S. Geer,et al.  Consistency for the least squares estimator in nonparametric regression , 1996 .

[134]  Peter L. Bartlett,et al.  Efficient agnostic learning of neural networks with bounded fan-in , 1996, IEEE Trans. Inf. Theory.

[135]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[136]  Federico Girosi,et al.  On the Relationship between Generalization Error, Hypothesis Complexity, and Sample Complexity for Radial Basis Functions , 1996, Neural Computation.

[137]  Adam Krzyzak,et al.  Nonparametric estimation and classification using radial basis function nets and empirical risk minimization , 1996, IEEE Trans. Neural Networks.

[138]  G. Lugosi,et al.  Consistency of Data-driven Histogram Methods for Density Estimation and Classification , 1996 .

[139]  Adam Krzyzak,et al.  Radial Basis Function Networks and Complexity Regularization in Function Learning , 2022 .

[140]  W. Härdle,et al.  Estimation of additive regression models with known links , 1996 .

[141]  A. Nobel Histogram regression estimation using data-dependent partitions , 1996 .

[142]  Oliver Linton,et al.  Miscellanea Efficient estimation of additive nonparametric regression models , 1997 .

[143]  Sidney J. Yakowitz,et al.  Weakly convergent nonparametric forecasting of stationary time series , 1997, IEEE Trans. Inf. Theory.

[144]  S. Yakowitz,et al.  Strongly-consistent nonparametric estimation of smooth regression functions for stationary ergodic sequences , 1997, Proceedings of IEEE International Symposium on Information Theory.

[145]  Young K. Truong,et al.  Polynomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture , 1997 .

[146]  M. Ledoux On Talagrand's deviation inequalities for product measures , 1997 .

[147]  Michel Talagrand,et al.  On the Convexified Sauer-Shelah Theorem , 1997, J. Comb. Theory, Ser. B.

[148]  Y. Yatracos,et al.  Rates of convergence of estimates, Kolmogorov's entropy and the dimensionality reduction principle in regression , 1997 .

[149]  E. F. Schuster,et al.  On a universal strong law of large numbers for conditional expectations , 1998 .

[150]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[151]  Michael Kohler,et al.  Nonparametric regression function estimation using interaction least squares splines and comlexity regularization , 1998 .

[152]  Arne Kovac,et al.  Extending the Scope of Wavelet Regression Methods by Coefficient-Dependent Thresholding , 2000 .

[153]  Xiaotong Shen ON THE METHOD OF PENALIZATION , 1998 .

[154]  G. Lugosi,et al.  Adaptive Model Selection Using Empirical Complexities , 1998 .

[155]  J. Simonoff Smoothing Methods in Statistics , 1998 .

[156]  M. Kohler Universally Consistent Regression Function Estimation Using Hierarchial B-Splines , 1999 .

[157]  Andrew C. Singer,et al.  Universal linear prediction by model order weighting , 1999, IEEE Trans. Signal Process..

[158]  Manfred K. Warmuth,et al.  Averaging Expert Predictions , 1999, EuroCOLT.

[159]  A. Nobel Limits to classification and regression estimation from ergodic processes , 1999 .

[160]  M. Kohler Inequalities for uniform deviations of averages from expectations with applications to nonparametric regression , 2000 .

[161]  Ron Meir,et al.  On the near optimality of the stochastic approximation of smooth functions by neural networks , 2000, Adv. Comput. Math..

[162]  S. Geer Empirical Processes in M-Estimation , 2000 .

[163]  Colin L. Mallows,et al.  Some Comments on Cp , 2000, Technometrics.

[164]  Adam Krzyzak,et al.  Nonparametric regression estimation using penalized least squares , 2001, IEEE Trans. Inf. Theory.

[165]  A. Krzyżak,et al.  Convergence and rates of convergence of radial basis functions networks in function learning , 2001 .

[166]  A. Krzyzak,et al.  Nonlinear function learning using optimal radial basis function networks , 2001, Proceedings. 2001 IEEE International Symposium on Information Theory (IEEE Cat. No.01CH37252).

[167]  Harro Walk Strong Universal Pointwise Consistency of Recursive Regression Estimates , 2001 .

[168]  M. Kohler Universal Consistency of Local Polynomial Kernel Regression Estimates , 2002 .

[169]  Harro Walk On cross-validation in kernel and partitioning regression estimation , 2002 .

[170]  Gábor Lugosi,et al.  Pattern Classification and Learning Theory , 2002 .

[171]  A. Krzyżak,et al.  Application of structural risk minimization to multivariate smoothing spline regression estimates , 2002 .

[172]  Andrea Rusnock,et al.  Statistics on the table: The history of statistical concepts and methods , 2002 .

[173]  Michael Kohler,et al.  Prediction from Randomly Right Censored Data , 2002 .

[174]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[175]  Xiaohong Chen,et al.  Estimation of Semiparametric Models When the Criterion Function is Not Smooth , 2002 .

[176]  A. Krzyżak,et al.  Strong consistency of automatic kernel regression estimates , 2003 .

[177]  A. Krzyżak,et al.  Nonparametric regression estimation by normalized radial basis function networks , 2003, IEEE International Symposium on Information Theory, 2003. Proceedings..

[178]  M. Kohler Nonlinear orthogonal series estimates for random design regression , 2003 .

[179]  Thomas M. Stoker EQUIVALENCE OF DIRECT , INDIRECT AND SLOPE ESTIMATORS OR AVERAGE DERIVATIVES , 2022 .