论文信息 - A Penalized Method for the Predictive Limit of Learning - 字舞流文

A Penalized Method for the Predictive Limit of Learning

Machine learning systems learn from and make predictions by building models from observed data. Because large models tend to overfit while small models tend to underfit for a given fixed dataset, a critical challenge is to select an appropriate model (e.g. set of variables/features). Model selection aims to strike a balance between the goodness of fit and model complexity, and thus to gain reliable predictive power. In this paper, we study a penalized model selection technique that asymptotically achieves the optimal expected prediction loss (referred to as the limit of learning) offered by a set of candidate models. We prove that the proposed procedure is both statistically efficient in the sense that it asymptotically approaches the limit of learning, and computationally efficient in the sense that it can be much faster than cross validation methods. Our theory applies for a wide variety of model classes, loss functions, and high dimensions (in the sense that the models' complexity can grow with data size). We released a python package with our proposed method for general usage like logistic regression and neural networks.

Jie Ding | Vahid Tarokh | Jiawei Zhou | Enmao Diao

[1] H. Akaike,et al. Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[2] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[3] J. Rissanen. Stochastic Complexity and Modeling , 1986 .

[4] Bin Yu,et al. Model Selection and the Principle of Minimum Description Length , 2001 .

[5] Peter Craven,et al. Smoothing noisy data with spline functions , 1978 .

[6] Seymour Geisser,et al. The Predictive Sample Reuse Method with Applications , 1975 .

[7] R. Shibata. Asymptotically Efficient Selection of the Order of the Model for Estimating Parameters of a Linear Process , 1980 .

[8] G. Casella,et al. Consistency of Bayesian procedures for variable selection , 2009, 0904.2978.

[9] Yuhong Yang. Can the Strengths of AIC and BIC Be Shared , 2005 .

[10] M. Stone. An Asymptotic Equivalence of Choice of Model by Cross‐Validation and Akaike's Criterion , 1977 .

[11] R. Shibata. An optimal selection of regression variables , 1981 .

[12] J. Shao. Linear Model Selection by Cross-validation , 1993 .

[13] H. White. Maximum Likelihood Estimation of Misspecified Models , 1982 .

[14] J. Shao. AN ASYMPTOTIC THEORY FOR LINEAR MODEL SELECTION , 1997 .

[15] H. Akaike. Statistical predictor identification , 1970 .

[16] Ping Zhang. Model Selection Via Multifold Cross Validation , 1993 .

[17] C. Z. Wei. On Predictive Least Squares Principles , 1992 .

[18] Jie Ding,et al. Bridging AIC and BIC: A New Criterion for Autoregression , 2015, IEEE Transactions on Information Theory.

[19] P. Burman. A comparative study of ordinary cross-validation, v-fold cross-validation and the repeated learning-testing methods , 1989 .

[20] Yuhong Yang,et al. Cross-validation for selecting a model selection procedure , 2015 .