Oracle Inequalities for Local and Global Empirical Risk Minimizers

The aim of this chapter is to provide an overview of general frameworks used to derive (sharp) oracle inequalities. Two extensions of a general theory for convex norm penalized empirical risk minimizers are summarized. The first one is for convex nondifferentiable loss functions. The second is for nonconvex differentiable loss functions. Theoretical understanding is required for the growing number of algorithms in statistics, machine learning, and, more recently, deep learning that are based on (combinations of) these types of loss functions. To motivate the importance of oracle inequalities, the problem of model misspecification in the linear model is first discussed. Then, the sharp oracle inequalities are stated. Finally, we show how to apply the general theory to problems from regression, classification, and dimension reduction.

[1]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[2]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[3]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[4]  A. Tsybakov,et al.  Sparsity oracle inequalities for the Lasso , 2007, 0705.3308.

[5]  Po-Ling Loh,et al.  Statistical consistency and asymptotic normality for high-dimensional robust M-estimators , 2015, ArXiv.

[6]  J WainwrightMartin,et al.  Regularized M-estimators with nonconvexity , 2015 .

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[9]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[10]  Sara van de Geer,et al.  Ecole d'été de probabilités de Saint-Flour XLV , 2016 .

[11]  S. Geer,et al.  Robust low-rank matrix estimation , 2016, The Annals of Statistics.

[12]  Weijie J. Su,et al.  Statistical estimation and testing via the sorted L1 norm , 2013, 1310.1969.

[13]  Sara van de Geer,et al.  Sharp Oracle Inequalities for Stationary Points of Nonconvex Penalized M-Estimators , 2018, IEEE Transactions on Information Theory.

[14]  Charles A. Micchelli,et al.  A Family of Penalty Functions for Structured Sparsity , 2010, NIPS.

[15]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[16]  Po-Ling Loh,et al.  Regularized M-estimators with nonconvexity: statistical and algorithmic theory for local optima , 2013, J. Mach. Learn. Res..

[17]  A. Montanari,et al.  The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.

[18]  Sara A. van de Geer,et al.  Sharp Oracle Inequalities for Square Root Regularization , 2015, J. Mach. Learn. Res..

[19]  P. Bickel,et al.  SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR , 2008, 0801.1095.

[20]  Po-Ling Loh,et al.  Support recovery without incoherence: A case for nonconvex regularization , 2014, ArXiv.

[21]  Massimiliano Pontil,et al.  Structured Sparsity and Generalization , 2011, J. Mach. Learn. Res..

[22]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[23]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[24]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[25]  S. Geer Weakly decomposable regularization penalties and structured sparsity , 2012, 1204.4813.

[26]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[27]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.