Residual variance estimation in machine learning

The problem of residual variance estimation consists of estimating the best possible generalization error obtainable by any model based on a finite sample of data. Even though it is a natural generalization of linear correlation, residual variance estimation in its general form has attracted relatively little attention in machine learning. In this paper, we examine four different residual variance estimators and analyze their properties both theoretically and experimentally to understand better their applicability in machine learning problems. The theoretical treatment differs from previous work by being based on a general formulation of the problem covering also heteroscedastic noise in contrary to previous work, which concentrates on homoscedastic and additive noise. In the second part of the paper, we demonstrate practical applications in input and model structure selection. The experimental results show that using residual variance estimators in these tasks gives good results often with a reduced computational complexity, while the nearest neighbor estimators are simple and easy to implement.

[1]  Györfi László,et al.  The estimation problem of minimum mean squared error , 2003 .

[2]  Johan A. K. Suykens,et al.  The differogram: Non-parametric noise variance estimation and its use for model selection , 2005, Neurocomputing.

[3]  Arup Bose,et al.  Variance estimation in high dimensional regression models , 2000 .

[4]  Michel Verleysen,et al.  LS-SVM Hyperparameter Selection with a Nonparametric Noise Estimator , 2005, ICANN.

[5]  Amaury Lendasse,et al.  On Nonparametric Residual Variance Estimation , 2008, Neural Processing Letters.

[6]  Michel Verleysen,et al.  Fast Selection of Spectral Variables with B-Spline Compression , 2007, ArXiv.

[7]  Johan A. K. Suykens,et al.  Variogram based noise variance estimation and its use in kernel based regression , 2003, 2003 IEEE XIII Workshop on Neural Networks for Signal Processing (IEEE Cat. No.03TH8718).

[8]  Dafydd Evans A law of large numbers for nearest neighbour statistics , 2008, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9]  Amaury Lendasse,et al.  Non-parametric Residual Variance Estimation in Supervised Learning , 2007, IWANN.

[10]  Jarkko Tikka Input Selection for Radial Basis Function Networks by Constrained Optimization , 2007, ICANN.

[11]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[12]  J. Steele An Efron-Stein inequality for nonsymmetric statistics , 1986 .

[13]  Dafydd Evans Data-derived estimates of noise for unknown smooth models using near-neighbour asymptotics , 2002 .

[14]  Amaury Lendasse,et al.  Methodology for long-term prediction of time series , 2007, Neurocomputing.

[15]  Amaury Lendasse,et al.  Nearest Neighbor Distributions and Noise Variance Estimation , 2007, ESANN.

[16]  Amaury Lendasse,et al.  Mutual information and gamma test for input selection , 2005, ESANN.

[17]  A. J. Jones,et al.  A proof of the Gamma test , 2002, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[18]  Antonia J. Jones,et al.  The Construction of Smooth Models using Irregular Embeddings Determined by a Gamma Test Analysis , 2002, Neural Computing & Applications.

[19]  Michel Verleysen,et al.  Using the Delta Test for Variable Selection , 2008, ESANN.

[20]  Sanjeev R. Kulkarni,et al.  Rates of convergence of nearest neighbor estimation under arbitrary sampling , 1995, IEEE Trans. Inf. Theory.

[21]  Antonia J. Jones,et al.  Non-parametric estimation of residual moments and covariance , 2008, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[22]  Antonia J. Jones,et al.  New tools in non-linear modelling and prediction , 2004, Comput. Manag. Sci..

[23]  Tiejun Tong,et al.  Estimating residual variance in nonparametric regression using least squares , 2005 .

[24]  Anton Schick,et al.  Estimating the error variance in nonparametric regression by a covariate-matched u-statistic , 2003 .

[25]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .