论文信息 - Regularization in statistics - 字舞流文

Regularization in statistics

This paper is a selective review of the regularization methods scattered in statistics literature. We introduce a general conceptual approach to regularization and fit most existing methods into it. We have tried to focus on the importance of regularization when dealing with today’s high-dimensional objects: data and models. A wide range of examples are discussed, including nonparametric regression, boosting, covariance matrix estimation, principal component estimation, subsampling.

S. Geer | P. Bickel | A. Tsybakov | Bin Yu | Jianqing Fan | T. Valdés | C. Rivero | Bo Li | A. Vaart

[1] D. Freedman,et al. Some Asymptotic Theory for the Bootstrap , 1981 .

[2] E. Nadaraya. On Estimating Regression , 1964 .

[3] E. Mammen. When does bootstrap work , 1992 .

[4] .. W. V. Der,et al. On Profile Likelihood , 2000 .

[5] P. Massart,et al. Risk bounds for model selection via penalization , 1999 .

[6] T. Cai,et al. An adaptation theory for nonparametric confidence intervals , 2004, math/0503662.

[7] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .

[8] Jianqing Fan,et al. Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[9] Bruno Torrésani,et al. Time-Frequency and Time-Scale Analysis , 1999 .

[10] James M. Robins,et al. Causal Inference from Complex Longitudinal Data , 1997 .

[11] D. Freedman. On the Bernstein-von Mises Theorem with Infinite Dimensional Parameters , 1999 .

[12] Hung Chen,et al. Convergence Rates for Parametric Components in a Partly Linear Model , 1988 .

[13] M. Wegkamp,et al. Consistent variable selection in high dimensional regression via multiple testing , 2006 .

[14] I. Johnstone,et al. Ideal spatial adaptation by wavelet shrinkage , 1994 .

[15] Arthur E. Hoerl,et al. Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[16] G. S. Watson,et al. Smooth regression analysis , 1964 .

[17] Mario Bertero,et al. The Stability of Inverse Problems , 1980 .

[18] Ker-Chau Li,et al. From Stein's Unbiased Risk Estimates to the Method of Generalized Cross Validation , 1985 .

[19] I. Johnstone,et al. Minimax estimation via wavelet shrinkage , 1998 .

[20] Hoon Kim,et al. Monte Carlo Statistical Methods , 2000, Technometrics.

[21] Ker-Chau Li,et al. Asymptotic optimality of CL and generalized cross-validation in ridge regression with application to spline smoothing , 1986 .

[22] Yuhong Yang. Mixing Strategies for Density Estimation , 2000 .

[23] I. Johnstone. On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[24] H. Akaike. Statistical predictor identification , 1970 .

[25] S. Keleş,et al. Asymptotically optimal model selection method with right censored outcomes , 2004 .

[26] P. Bickel,et al. Regularized estimation of large covariance matrices , 2008, 0803.1909.

[27] T. Valdés,et al. Mean‐Based Iterative Procedures in Linear Models with General Errors and Grouped Data , 2004 .

[28] M. Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[29] Arkadi Nemirovski,et al. Topics in Non-Parametric Statistics , 2000 .

[30] Jussi Klemelä. Density estimation with stagewise optimization of the empirical risk , 2006, Machine Learning.

[31] P. Massart,et al. From Model Selection to Adaptive Estimation , 1997 .

[32] P. Bickel. Efficient and Adaptive Estimation for Semiparametric Models , 1993 .

[33] F. Götze,et al. Adaptive choice of bootstrap sample sizes , 2001 .

[34] S. Ross. The arbitrage theory of capital asset pricing , 1976 .

[35] J. Robins,et al. Adaptive nonparametric confidence sets , 2006, math/0605473.

[36] A. Tsybakov,et al. Aggregation for Gaussian regression , 2007, 0710.3654.

[37] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[38] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[39] P. Hall,et al. On blocking rules for the bootstrap with dependent data , 1995 .

[40] M. Rothschild,et al. Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets , 1982 .

[41] M. Pourahmadi. Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix , 2000 .

[42] M. Pourahmadi. Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[43] László Györfi,et al. A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[44] Jianhua Z. Huang,et al. Covariance matrix selection and estimation via penalised normal likelihood , 2006 .

[45] Alexander V. Nazin,et al. Recursive Aggregation of Estimators by the Mirror Descent Algorithm with Averaging , 2005, Probl. Inf. Transm..

[46] L. Wasserman,et al. A Reference Bayesian Test for Nested Hypotheses and its Relationship to the Schwarz Criterion , 1995 .

[47] A. Juditsky,et al. Functional aggregation for nonparametric regression , 2000 .

[48] Farhad Mehran,et al. The Generalized Jackknife Statistic , 1975 .

[49] André Elisseeff,et al. Stability and Generalization , 2002, J. Mach. Learn. Res..

[50] Tzee-Ming Huang. Convergence rates for posterior distributions and adaptive estimation , 2004, math/0410087.

[51] G. Wahba. Spline models for observational data , 1990 .

[52] P. Bickel,et al. On the Choice of m in the m Out of n Bootstrap and its Application to Condence Bounds for Extreme Percentiles y , 2005 .

[53] Jianqing Fan,et al. Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[54] Runze Li,et al. Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery , 2006, math/0602133.

[55] Ronald A. DeVore,et al. Approximation Methods for Supervised Learning , 2006, Found. Comput. Math..

[56] Gerard Kerkyacharian,et al. Entropy, Universal Coding, Approximation, and Bases Properties , 2003 .

[57] M. Kosorok,et al. Marginal asymptotics for the “large $p$, small $n$” paradigm: With applications to microarray data , 2005, math/0508219.

[58] T. Tony Cai,et al. On Adaptive Estimation of Linear Functionals , 2005 .

[59] Gregory Piatetsky-Shapiro,et al. High-Dimensional Data Analysis: The Curses and Blessings of Dimensionality , 2000 .

[60] Arnold J Stromberg,et al. Subsampling , 2001, Technometrics.

[61] Peter Buhlmann. Boosting for high-dimensional linear models , 2006, math/0606789.

[62] K. Wachter. The Strong Limits of Random Matrix Spectra for Sample Matrices of Independent Elements , 1978 .

[63] Y. Ritov,et al. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization , 2004 .

[64] Joseph P. Romano,et al. Large Sample Confidence Regions Based on Subsamples under Minimal Assumptions , 1994 .

[65] D. Hunter,et al. Variable Selection using MM Algorithms. , 2005, Annals of statistics.

[66] Yuhong Yang. Aggregating regression procedures to improve performance , 2004 .

[67] A. Juditsky,et al. Learning by mirror averaging , 2005, math/0511468.

[68] R. Kohn,et al. Parsimonious Covariance Matrix Estimation for Longitudinal Data , 2002 .

[69] E. Belitser,et al. Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution , 2003 .

[70] BOOSTING WITH EARLY STOPPING: CONVERGENCE , 2005 .

[71] Jianqing Fan,et al. Local polynomial modelling and its applications , 1994 .

[72] P. Bühlmann,et al. Sparse Boosting , 2006, J. Mach. Learn. Res..

[73] H. Künsch. The Jackknife and the Bootstrap for General Stationary Observations , 1989 .

[74] E. Fama,et al. Common risk factors in the returns on stocks and bonds , 1993 .

[75] Maia Berkane. Latent Variable Modeling and Applications to Causality , 1997 .

[76] S. Dudoit,et al. Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[77] B. Ripley,et al. Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[78] R. Tibshirani,et al. Prediction by Supervised Principal Components , 2006 .

[79] G. Lugosi,et al. Adaptive Model Selection Using Empirical Complexities , 1998 .

[80] Colin L. Mallows,et al. Some Comments on Cp , 2000, Technometrics.

[81] P. Massart,et al. Gaussian model selection , 2001 .

[82] Somnath Datta,et al. Bootstrap Inference for a First-Order Autoregression with Positive Innovations , 1995 .

[83] Jianqing Fan,et al. Generalized likelihood ratio statistics and Wilks phenomenon , 2001 .

[84] V. Koltchinskii. Local Rademacher complexities and oracle inequalities in risk minimization , 2006, 0708.0083.

[85] Michael A. Saunders,et al. Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[86] I. Johnstone,et al. Empirical Bayes selection of wavelet thresholds , 2005, math/0508281.

[87] Shie Mannor,et al. Greedy Algorithms for Classification -- Consistency, Convergence Rates, and Adaptivity , 2003, J. Mach. Learn. Res..

[88] A. V. D. Vaart,et al. Convergence rates of posterior distributions , 2000 .

[89] G. Lugosi,et al. On the Bayes-risk consistency of regularized boosting methods , 2003 .

[90] J. Hodges. Efficiency in normal samples and tolerance of extreme values for some estimates of location , 1967 .

[91] A. Tsybakov,et al. Optimal aggregation of classifiers in statistical learning , 2003 .

[92] R. Tibshirani,et al. Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[93] J. Shao. AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[94] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[95] Sophie Lambert-Lacroix,et al. On nonparametric confidence set estimation , 2001 .

[96] Olivier Ledoit,et al. A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[97] Peter Craven,et al. Smoothing noisy data with spline functions , 1978 .

[98] I. Daubechies,et al. Tree Approximation and Optimal Encoding , 2001 .

[99] Richard F. Gunst,et al. Applied Regression Analysis , 1999, Technometrics.

[100] E. Mammen. The Bootstrap and Edgeworth Expansion , 1997 .

[101] Luc Devroye,et al. Distribution-free performance bounds for potential function rules , 1979, IEEE Trans. Inf. Theory.

[102] Alexandre B. Tsybakov,et al. Optimal Rates of Aggregation , 2003, COLT.

[103] P. Bickel,et al. Some Theory for Generalized Boosting Algorithms , 2006, J. Mach. Learn. Res..

[104] George G. Lorentz,et al. Constructive Approximation , 1993, Grundlehren der mathematischen Wissenschaften.

[105] E. Wigner. Characteristic Vectors of Bordered Matrices with Infinite Dimensions I , 1955 .

[106] A. Böttcher,et al. Introduction to Large Truncated Toeplitz Matrices , 1998 .

[107] M. Stone,et al. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[108] Frederick R. Forst,et al. On robust estimation of the location parameter , 1980 .

[109] E. Greenshtein. Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint , 2006, math/0702684.

[110] Florentina Bunea,et al. Aggregation and sparsity via 1 penalized least squares , 2006 .

[111] L. Breiman. Heuristics of instability and stabilization in model selection , 1996 .

[112] D. Cox. An Analysis of Bayesian Inference for Nonparametric Regression , 1993 .

[113] James M. Robins,et al. Unified Methods for Censored Longitudinal Data and Causality , 2003 .

[114] C. Stein,et al. Estimation with Quadratic Loss , 1992 .

[115] M. Pourahmadi,et al. Nonparametric estimation of large covariance matrices of longitudinal data , 2003 .

[116] Danny Kopec,et al. Additional References , 2003 .

[117] E. Parzen. On Estimation of a Probability Density Function and Mode , 1962 .

[118] Jorma Rissanen,et al. Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[119] D. Paul,et al. Asymptotics of the leading sample eigenvalues for a spiked covariance model , 2004 .

[120] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[121] E. Mammen,et al. Smooth Discrimination Analysis , 1999 .

[122] Adam Krzyzak,et al. A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[123] Leo Breiman,et al. Robust confidence bounds for extreme upper quantiles , 1990 .

[124] Peter L. Bartlett,et al. Functional Gradient Techniques for Combining Hypotheses , 2000 .

[125] P. Bickel,et al. Non- and semiparametric statistics: compared and contrasted , 2000 .

[126] Jianqing Fan,et al. Semilinear High-Dimensional Model for Normalization of Microarray Data , 2005 .

[127] T. Bengtsson,et al. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants , 2007 .

[128] Ker-Chau Li,et al. Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .