One of the key problems in machine learning theory and practice is setting the correct value of the regularization parameter; this is particularly crucial in Kernel Machines such as Support Vector Machines, Regularized Least Square or Neural Networks with Weight Decay terms. Well known methods such as Leave-One-Out (or GCV) and Evidence Maximization offer a way of predicting the regularization parameter. This work points out the failure of these methods for predicting the regularization parameter when coping with the, apparently trivial and here introduced, regularized mean problem; this is the simplest form of Tikhonov regularization, that, in turn, is the primal form of the learning algorithm Regularized Least Squares. This controlled environment gives the possibility to define oracular notions of regularization and to experiment new methodologies for predicting the regularization parameter that can be extended to the more general regression case. The analysis stems from James-Stein theory, shows the equivalence of shrinking and regularization and is carried using multiple kernels learning for regression and SVD analysis; a mean value estimator is built, first via a rational function and secondly via a balanced neural network architecture suitable for estimating statistical quantities and gaining symmetric expectations. The obtained results show that a non-linear analysis of the sample and a non-linear estimation of the mean obtained by neural networks can be profitably used to improve the accuracy of mean value estimations, especially when a small number of realizations is provided.
[1]
Sandro Ridella,et al.
Circular backpropagation networks for classification
,
1997,
IEEE Trans. Neural Networks.
[2]
David J. C. MacKay,et al.
A Practical Bayesian Framework for Backpropagation Networks
,
1992,
Neural Computation.
[3]
C. Stein,et al.
Estimation with Quadratic Loss
,
1992
.
[4]
Hans-Peter Kriegel,et al.
Integrating structured biological data by Kernel Maximum Mean Discrepancy
,
2006,
ISMB.
[5]
Richard L. Smith,et al.
PREDICTIVE INFERENCE
,
2004
.
[6]
L. Gleser,et al.
Setting confidence intervals for bounded parameters
,
2002
.
[7]
Tomaso A. Poggio,et al.
Regularization Networks and Support Vector Machines
,
2000,
Adv. Comput. Math..
[8]
James R. Thompson.
Some Shrinkage Techniques for Estimating the Mean
,
1968
.
[9]
Sandro Ridella,et al.
A neural model approach for regularization in the mean estimation case
,
2010,
The 2010 International Joint Conference on Neural Networks (IJCNN).
[10]
Michael I. Jordan,et al.
Asymptotically Optimal Regularization in Smooth Parametric Models
,
2009,
NIPS.
[11]
Vladimir Vapnik,et al.
Statistical learning theory
,
1998
.
[12]
John E. Moody,et al.
The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems
,
1991,
NIPS.
[13]
Yves Grandvalet,et al.
Y.: SimpleMKL
,
2008
.