Wavelet regression by cross-validation,

This paper is about using wavelets for regression. The main aim of the paper is to introduce and develop a cross-validation method for selecting a wavelet regression threshold that produces good estimates with respect to L2 error. The selected threshold determines which coeecients to keep in an orthogonal wavelet expansion of noisy data and acts in a similar way to a smoothing parameter in non-parametric regression. The paper gives a very brief introduction to wavelets and how the discrete wavelet transform with a thresholding rule can be used to estimate functions from noisy data. The paper examines the integrated square error (ISE) of a soft-thresholded wavelet estimator as a function of the threshold and shows that the ISE is almost always convex and almost always has negative rst derivative for a zero threshold. This implies that it is almost always possible to nd a unique minimum for the ISE. A cross-validation score is introduced to estimate the ISE of a soft-thresholded estimator. The score is based on splitting the data into two sets with one set consisting of odd-indexed observations and the other containing the even-indexed ones. Such a splitting is convenient because the discrete wavelet transform, which is involved in computing a regression , only operates on data sets containing a power of two number of data points. The theoretical behaviour of the cross-validation score as a function of the threshold is investigated and compared to the true ISE which it closely resembles. The penultimate part of this paper compares the cross-validation procedure with other threshold-choice methods for a three functions and three diierent noise structures. The comparisons suggest that for sparse signals the cross-validation method successfully estimates thètrue' ISE and can outperform other methods. Comparisons using non-normal noise show 1 that the cross-validated choice is more resilient to outliers that other methods. However, all of the methods we used were unable to nd exactly the \true" threshold when the data were correlated.