Asymptotically Optimal Regularization in Smooth Parametric Models

Many types of regularization schemes have been employed in statistical learning, each motivated by some assumption about the problem domain. In this paper, we present a unified asymptotic analysis of smooth regularizers, which allows us to see how the validity of these assumptions impacts the success of a particular regularizer. In addition, our analysis motivates an algorithm for optimizing regularization parameters, which in turn can be analyzed within our framework. We apply our analysis to several examples, including hybrid generative-discriminative learning and multi-task learning.

[1]  M. Bartlett,et al.  APPROXIMATE CONFIDENCE INTERVALSMORE THAN ONE UNKNOWN PARAMETER , 1953 .

[2]  M. Bartlett,et al.  APPROXIMATE CONFIDENCE INTERVALS , 1953 .

[3]  H. Akaike A new look at the statistical model identification , 1974 .

[4]  G. Wahba Smoothing noisy data with spline functions , 1975 .

[5]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[6]  J. Bernardo Reference Posterior Distributions for Bayesian Inference , 1979 .

[7]  C. Stein Estimation of the Mean of a Multivariate Normal Distribution , 1981 .

[8]  B. Levit Second-Order Asymptotic Optimality and Positive Solutions of Schrödinger’s Equation , 1986 .

[9]  R. Shibata Statistical aspects of model selection , 1989 .

[10]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[11]  Shun-ichi Amari,et al.  Network information criterion-determining the number of hidden units for an artificial neural network model , 1994, IEEE Trans. Neural Networks.

[12]  G. Kitagawa,et al.  Generalised information criteria in model selection , 1996 .

[13]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[14]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[15]  A. V. D. Vaart Asymptotic Statistics: Delta Method , 1998 .

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  André Elisseeff,et al.  Stability and Generalization , 2002, J. Mach. Learn. Res..

[18]  Rajat Raina,et al.  Classification with Hybrid Generative/Discriminative Models , 2003, NIPS.

[19]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[20]  Guillaume Bouchard,et al.  The Tradeoff Between Generative and Discriminative Classifiers , 2004 .

[21]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[22]  P. Bartlett,et al.  Local Rademacher complexities , 2005, math/0508275.

[23]  Christopher Joseph Pal,et al.  Multi-Conditional Learning: Generative/Discriminative Training for Clustering and Classification , 2006, AAAI.

[24]  A. Tsybakov,et al.  PENALIZED MAXIMUM LIKELIHOOD AND SEMIPARAMETRIC SECOND-ORDER EFFICIENCY , 2006, math/0605437.

[25]  Yoshua Bengio,et al.  Entropy Regularization , 2006, Semi-Supervised Learning.

[26]  Morten Nielsen,et al.  A Community Resource Benchmarking Predictions of Peptide Binding to MHC-I Molecules , 2006, PLoS Comput. Biol..

[27]  S. Geer,et al.  Regularization in statistics , 2006 .

[28]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[29]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[30]  Tom Minka,et al.  Principled Hybrids of Generative and Discriminative Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[31]  Guillaume Bouchard Bias-variance tradeoff in hybrid generative-discriminative models , 2007, ICMLA 2007.

[32]  Michael I. Jordan,et al.  An asymptotic analysis of generative, discriminative, and pseudolikelihood estimators , 2008, ICML '08.

[33]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[34]  Yonina C. Eldar Generalized SURE for Exponential Families: Applications to Regularization , 2008, IEEE Transactions on Signal Processing.

[35]  Nathan Srebro,et al.  On the Interaction between Norm and Dimensionality: Multiple Regimes in Learning , 2010, ICML.