论文信息 - Regularization in statistics - 字舞流文

Regularization in statistics

This paper is a selective review of the regularization methods scattered in statistics literature. We introduce a general conceptual approach to regularization and fit most existing methods into it. We have tried to focus on the importance of regularization when dealing with today's high-dimensional objects: data and models. A wide range of examples are discussed, including nonparametric regression, boosting, covariance matrix estimation, principal component estimation, subsampling.

S. Geer | P. Bickel | A. Tsybakov | Bin Yu | Jianqing Fan | A. V. D. Vaart | Bo Li | T. Valdés | C. Rivero | Bo Li | A. Vaart

[1] E. Wigner. Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .

[2] M. Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[3] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .

[4] E. Nadaraya. On Estimating Regression , 1964 .

[5] G. S. Watson,et al. Smooth regression analysis , 1964 .

[6] E. Nadaraya. On Non-Parametric Estimates of Density Functions and Regression Curves , 1965 .

[7] J. Hodges. Efficiency in normal samples and tolerance of extreme values for some estimates of location , 1967 .

[8] H. Akaike. Statistical predictor identification , 1970 .

[9] A. E. Hoerl,et al. Ridge Regression: Applications to Nonorthogonal Problems , 1970 .

[10] C. L. Mallows. Some comments on C_p , 1973 .

[11] W. Strawderman. The Generalized Jackknife Statistic , 1973 .

[12] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[13] H. L. Gray,et al. The Generalised Jackknife Statistic , 1974 .

[14] G. Wahba. Smoothing noisy data with spline functions , 1975 .

[15] Farhad Mehran,et al. The Generalized Jackknife Statistic , 1975 .

[16] S. Ross. The arbitrage theory of capital asset pricing , 1976 .

[17] M. Stone,et al. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[18] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[19] K. Wachter. The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements , 1978 .

[20] Peter Craven,et al. Smoothing noisy data with spline functions , 1978 .

[21] Luc Devroye,et al. Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[22] Mario Bertero,et al. The Stability of Inverse Problems , 1980 .

[23] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[24] D. Freedman,et al. Some Asymptotic Theory for the Bootstrap , 1981 .

[25] M. Rothschild,et al. Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1982 .

[26] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[27] Ker-Chau Li,et al. From Stein's Unbiased Risk Estimates to the Method of Generalized Cross Validation , 1985 .

[28] Ker-Chau Li,et al. Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing , 1986 .

[29] Ker-Chau Li,et al. Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[30] Hung Chen,et al. Convergence Rates for Parametric Components in a Partly Linear Model , 1988 .

[31] H. Künsch. The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[32] G. Wahba. Spline models for observational data , 1990 .

[33] Leo Breiman,et al. Robust confidence bounds for extreme upper quantiles , 1990 .

[34] E. Mammen. When does bootstrap work , 1992 .

[35] E. Mammen. When Does Bootstrap Work?: Asymptotic Results and Simulations , 1992 .

[36] C. Stein,et al. Estimation with Quadratic Loss , 1992 .

[37] P. Bickel. Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[38] E. Fama,et al. Common risk factors in the returns on stocks and bonds , 1993 .

[39] George G. Lorentz,et al. Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[40] D. Cox. An Analysis of Bayesian Inference for Nonparametric Regression , 1993 .

[41] I. Johnstone,et al. Ideal spatial adaptation by wavelet shrinkage , 1994 .

[42] Joseph P. Romano,et al. Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions , 1994 .

[43] Jianqing Fan,et al. Local polynomial modelling and its applications , 1994 .

[44] K. Do,et al. Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[45] Danny Kopec,et al. Additional References , 2003 .

[46] P. Hall,et al. On blocking rules for the bootstrap with dependent data , 1995 .

[47] L. Wasserman,et al. A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[48] Somnath Datta,et al. Bootstrap Inference for a First-Order Autoregression with Positive Innovations , 1995 .

[49] C. Mallows. More comments on C p , 1995 .

[50] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[51] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[52] L. Breiman. Heuristics of instability and stabilization in model selection , 1996 .

[53] James M. Robins,et al. Causal Inference from Complex Longitudinal Data , 1997 .

[54] P. Massart,et al. From Model Selection to Adaptive Estimation , 1997 .

[55] Young K. Truong,et al. Polynomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture , 1997 .

[56] Maia Berkane. Latent Variable Modeling and Applications to Causality , 1997 .

[57] J. Shao. AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[58] E. Mammen. The Bootstrap and Edgeworth Expansion , 1997 .

[59] I. Johnstone,et al. Minimax estimation via wavelet shrinkage , 1998 .

[60] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[61] N. Draper,et al. Applied Regression Analysis: Draper/Applied Regression Analysis , 1998 .

[62] G. Lugosi,et al. Adaptive Model Selection Using Empirical Complexities , 1998 .

[63] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[64] A. Böttcher,et al. Introduction to Large Truncated Toeplitz Matrices , 1998 .

[65] G. Lugosi,et al. On Prediction of Individual Sequences , 1998 .

[66] C. H. Oh,et al. Some comments on , 1998 .

[67] P. Massart,et al. Risk bounds for model selection via penalization , 1999 .

[68] Bruno Torrésani,et al. Time-Frequency and Time-Scale Analysis , 1999 .

[69] D. Freedman. On the Bernstein-von Mises Theorem with Infinite Dimensional Parameters , 1999 .

[70] M. Pourahmadi. Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[71] Richard F. Gunst,et al. Applied Regression Analysis , 1999, Technometrics.

[72] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .

[73] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[74] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.

[75] Yuhong Yang. Mixing Strategies for Density Estimation , 2000 .

[76] Arkadi Nemirovski,et al. Topics in Non-Parametric Statistics , 2000 .

[77] M. Pourahmadi. Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix , 2000 .

[78] A. Juditsky,et al. Functional aggregation for nonparametric regression , 2000 .

[79] Gregory Piatetsky-Shapiro,et al. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[80] A. W. van der Vaart,et al. On Profile Likelihood , 2000 .

[81] Colin L. Mallows,et al. Some Comments on Cp , 2000, Technometrics.

[82] A. V. D. Vaart,et al. Convergence rates of posterior distributions , 2000 .

[83] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .

[84] P. Bickel,et al. Non- and semiparametric statistics: compared and contrasted , 2000 .

[85] I. Johnstone. On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[86] F. Götze,et al. Adaptive choice of bootstrap sample sizes , 2001 .

[87] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[88] Arnold J Stromberg,et al. Subsampling , 2001, Technometrics.

[89] P. Massart,et al. Gaussian model selection , 2001 .

[90] Jianqing Fan,et al. Generalized likelihood ratio statistics and Wilks phenomenon , 2001 .

[91] Jianqing Fan,et al. Regularization of Wavelet Approximations , 2001 .

[92] Sophie Lambert-Lacroix,et al. On nonparametric confidence set estimation , 2001 .

[93] I. Daubechies,et al. Tree Approximation and Optimal Encoding , 2001 .

[94] O. Lepski,et al. Random rates in anisotropic regression (with a discussion and a rejoinder by the authors) , 2002 .

[95] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[96] L. Györfi,et al. A Distribution-Free Theory of Nonparametric Regression (Springer Series in Statistics) , 2002 .

[97] R. Kohn,et al. Parsimonious Covariance Matrix Estimation for Longitudinal Data , 2002 .

[98] S. Dudoit,et al. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[99] R. Tibshirani,et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[100] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[101] S. Ghosal,et al. On Bayesian Adaptation , 2003 .

[102] Gerard Kerkyacharian,et al. Entropy, Universal Coding, Approximation, and Bases Properties , 2003 .

[103] E. Belitser,et al. Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution , 2003 .

[104] Shie Mannor,et al. Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[105] G. Lugosi,et al. On the Bayes-risk consistency of regularized boosting methods , 2003 .

[106] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .

[107] Alexandre B. Tsybakov,et al. Optimal Rates of Aggregation , 2003, COLT.

[108] James M. Robins,et al. Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[109] M. Pourahmadi,et al. Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[110] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[111] T. Cai,et al. An adaptation theory for nonparametric confidence intervals , 2004, math/0503662.

[112] Jianqing Fan,et al. Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[113] R. Tibshirani,et al. Least angle regression , 2004, math/0406456.

[114] S. Keleş,et al. Asymptotically optimal model selection method with right censored outcomes , 2004 .

[115] T. Valdés,et al. Mean‐Based Iterative Procedures in Linear Models with General Errors and Grouped Data , 2004 .

[116] Meta M. Voelker,et al. Variable Selection and Model Building via Likelihood Basis Pursuit , 2004 .

[117] Tzee-Ming Huang. Convergence rates for posterior distributions and adaptive estimation , 2004, math/0410087.

[118] Y. Ritov,et al. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[119] Yuhong Yang. Aggregating regression procedures to improve performance , 2004 .

[120] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[121] Olivier Ledoit,et al. A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[122] P. Bickel,et al. Some theory for Fisher''s linear discriminant function , 2004 .

[123] D. Paul,et al. Asymptotics of the leading sample eigenvalues for a spiked covariance model , 2004 .

[124] B. Efron. The Estimation of Prediction Error , 2004 .

[125] Jianqing Fan,et al. Removing intensity effects and identifying significant genes for Affymetrix arrays in macrophage migration inhibitory factor-suppressed neuroblastoma cells. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[126] S. Dudoit,et al. Asymptotics of cross-validated risk estimation in estimator selection and performance assessment , 2005 .

[127] Alexander V. Nazin,et al. Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging , 2005, Probl. Inf. Transm..

[128] P. Bickel,et al. On the Choice of m in the m Out of n Bootstrap and its Application to Condence Bounds for Extreme Percentiles y , 2005 .

[129] Bin Yu,et al. Boosting with early stopping: Convergence and consistency , 2005, math/0508276.

[130] M. Kosorok,et al. Marginal asymptotics for the “large $p$, small $n$” paradigm: With applications to microarray data , 2005, math/0508219.

[131] T. Tony Cai,et al. On Adaptive Estimation of Linear Functionals , 2005 .

[132] Christian P. Robert,et al. Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[133] D. Hunter,et al. Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[134] J. Robins,et al. Robust inference with higher order influence functions: Part I, Part II , 2005 .

[135] BOOSTING WITH EARLY STOPPING: CONVERGENCE , 2005 .

[136] Jianqing Fan,et al. Nonparametric Inferences for Additive Models , 2005 .

[137] I. Johnstone,et al. Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.

[138] H. Zou,et al. Regularization and variable selection via the elastic net , 2005 .

[139] S. Boucheron,et al. Theory of classification : a survey of some recent advances , 2005 .

[140] Jianqing Fan,et al. Semilinear High-Dimensional Model for Normalization of Microarray Data , 2005 .

[141] M. Wegkamp,et al. Consistent variable selection in high dimensional regression via multiple testing , 2006 .

[142] Jussi Klemelä. Density estimation with stagewise optimization of the empirical risk , 2006, Machine Learning.

[143] J. Robins,et al. Adaptive nonparametric confidence sets , 2006, math/0605473.

[144] Jianhua Z. Huang,et al. Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[145] Runze Li,et al. Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[146] Ronald A. DeVore,et al. Approximation Methods for Supervised Learning , 2006, Found. Comput. Math..

[147] Peter Buhlmann. Boosting for high-dimensional linear models , 2006, math/0606789.

[148] P. Bühlmann,et al. Sparse Boosting , 2006, J. Mach. Learn. Res..

[149] R. Tibshirani,et al. Prediction by Supervised Principal Components , 2006 .

[150] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[151] B. Peter. BOOSTING FOR HIGH-DIMENSIONAL LINEAR MODELS , 2006 .

[152] P. Bickel,et al. Some Theory for Generalized Boosting Algorithms , 2006, J. Mach. Learn. Res..

[153] E. Greenshtein. Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.

[154] Florentina Bunea,et al. Aggregation and sparsity via 1 penalized least squares , 2006 .

[155] A. Tsybakov,et al. Aggregation for Gaussian regression , 2007, 0710.3654.

[156] A. V. D. Vaart,et al. Convergence rates of posterior distributions for non-i.i.d. observations , 2007, 0708.0491.

[157] T. Bengtsson,et al. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants , 2007 .

[158] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .

[159] P. Bickel,et al. Regularized estimation of large covariance matrices , 2008, 0803.1909.

[160] A. Juditsky,et al. Learning by mirror averaging , 2005, math/0511468.