A fast ‘Monte-Carlo cross-validation’ procedure for large least squares problems with noisy data

SummaryWe propose a fast Monte-Carlo algorithm for calculating reliable estimates of the trace of the influence matrixAτ involved in regularization of linear equations or data smoothing problems, where τ is the regularization or smoothing parameter. This general algorithm is simply as follows: i) generaten pseudo-random valuesw1, ...,wn, from the standard normal distribution (wheren is the number of data points) and letw=(w1, ...,wn)T, ii) compute the residual vectorw−Aτw, iii) take the ‘normalized” inner-product (wT(w−Aτw))/(wTw) as an approximation to (1/n)tr(I−Aτ). We show, both by theoretical bounds and by numerical simulations on some typical problems, that the expected relative precision of these estimates is very good whenn is large enough, and that they can be used in practice for the minimization with respect to τ of the well known Generalized Cross-Validation (GCV) function. This permits the use of the GCV method for choosing τ in any particular large-scale application, with only a similar amount of work as the standard residual method. Numerical applications of this procedure to optimal spline smoothing in one or two dimensions show its efficiency.

[1]  J. Neumann Distribution of the Ratio of the Mean Square Successive Difference to the Variance , 1941 .

[2]  Richard Bellman,et al.  Introduction to Matrix Analysis , 1972 .

[3]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[4]  Irene A. Stegun,et al.  Handbook of Mathematical Functions. , 1966 .

[5]  C. Reinsch Smoothing by spline functions , 1967 .

[6]  P. Laurent,et al.  A general method for the construction of interpolating or smoothing spline-functions , 1968 .

[7]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[8]  G. Wahba The approximate solution of linear operator equations when the data are noisy , 1976, Advances in Applied Probability.

[9]  Harry C. Andrews,et al.  Least Squares Image Restoration Using Spline Basis Functions , 1977, IEEE Transactions on Computers.

[10]  L. Eldén Algorithms for the regularization of ill-conditioned least squares problems , 1977 .

[11]  G. Wahba Practical Approximate Solutions to Linear Operator Equations When the Data are Noisy , 1977 .

[12]  E. Angel,et al.  Restoration of images degraded by spatially varying pointspread functions by a conjugate gradient method. , 1978, Applied optics.

[13]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[14]  Florencio I. Utreras Utilisation de la méthode de validation croisée pour le lissage par fonctions Spline à une ou deux variables , 1979 .

[15]  F. Utreras Diaz,et al.  Sur le choix du paramètre d'ajustement dans le lissage par fonctions spline , 1980 .

[16]  D. F. Utreras,et al.  Optimal Smoothing of Noisy Data Using Spline Functions , 1981 .

[17]  Florencio I. Utreras On Computing Robust Splines and Applications , 1981 .

[18]  Gene H. Golub,et al.  Matrix computations , 1983 .

[19]  F. Utreras Natural spline functions, their associated eigenvalue problem , 1983 .

[20]  L. Eldén A note on the computation of the generalized cross-validation function for ill-conditioned least squares problems , 1984 .

[21]  Bernard W. Silverman,et al.  A Fast and Efficient Cross-Validation Method for Smoothing Parameter Choice in Spline Regression , 1984 .

[22]  N. LarsEldi A NOTE ON THE COMPUTATION OF THE GENERALIZED CROSS-VALIDATION FUNCTION FOR ILL-CONDITIONED LEAST SQUARES PROBLEMS , 1984 .

[23]  B. Silverman,et al.  Some Aspects of the Spline Smoothing Approach to Non‐Parametric Regression Curve Fitting , 1985 .

[24]  M. Hutchinson,et al.  Smoothing noisy data with spline functions , 1985 .

[25]  P. Laurent Inf-convolution spline pour l'approximation de données discontinues , 1986 .

[26]  Imag Boite,et al.  OPTIMAL SMOOTHING OF NOISY BROKEN DATA , 1986 .

[27]  M. Hutchinson,et al.  An efficient method for calculating smoothing splines using orthogonal transformations , 1986 .

[28]  S. Rippa,et al.  Numerical Procedures for Surface Fitting of Scattered Data by Radial Functions , 1986 .

[29]  Didier Girard Practical optimal regularization of large linear systems , 1986 .

[30]  D. Girard Optimal Regularized Reconstruction in Computerized Tomography , 1987 .

[31]  R. Arcangeli,et al.  Sur la construction de surfaces de classe $C^k$ à partir d’un grand nombre de données de Lagrange , 1987 .