Optimal learning with Q-aggregation

We consider a general supervised learning problem with strongly convex and Lipshitz loss and study the problem of model selection aggre- gation. In particular, given a finite dictionary functions (learners) together with the prior, we generalize the results obtained by Dai, Rigollet and Zhang (2012) for Gaussian regression with squared loss and fixed design to this learning setup. Specifically, we prove that the Q-aggregation pro- cedure outputs an estimator that satisfies optimal oracle inequalities both in expectation and with high probability. Our proof techniques somewhat depart from traditional proofs by making most of the standard arguments on the Laplace transform of the empirical process to be controlled. AMS 2000 subject classifications: Primary 62H25; secondary 62F04, 90C22.

[1]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[2]  M. Talagrand,et al.  Probability in Banach spaces , 1991 .

[3]  Peter L. Bartlett,et al.  The Importance of Convexity in Learning with Squared Loss , 1998, IEEE Trans. Inf. Theory.

[4]  Yuhong Yang Mixing Strategies for Density Estimation , 2000 .

[5]  A. Juditsky,et al.  Functional aggregation for nonparametric regression , 2000 .

[6]  M. Talagrand,et al.  Lectures on Probability Theory and Statistics , 2000 .

[7]  Yuhong Yang Combining Different Procedures for Adaptive Regression , 2000, Journal of Multivariate Analysis.

[8]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[9]  Alexandre B. Tsybakov,et al.  Optimal Rates of Aggregation , 2003, COLT.

[10]  Olivier Catoni,et al.  Statistical learning theory and stochastic optimization , 2004 .

[11]  J. Picard,et al.  Lectures on probability theory and statistics , 2004 .

[12]  Ofer Zeitouni,et al.  Lectures on probability theory and statistics , 2004 .

[13]  J. Hiriart-Urruty,et al.  Fundamentals of Convex Analysis , 2004 .

[14]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[15]  G. Lecu'e Optimal rates of aggregation in classification under low noise assumption , 2006, math/0603447.

[16]  Arnak S. Dalalyan,et al.  Aggregation by Exponential Weighting and Sharp Oracle Inequalities , 2007, COLT.

[17]  A. Tsybakov,et al.  Aggregation for Gaussian regression , 2007, 0710.3654.

[18]  O. Catoni PAC-BAYESIAN SUPERVISED CLASSIFICATION: The Thermodynamics of Statistical Learning , 2007, 0712.0248.

[19]  Jean-Yves Audibert,et al.  Progressive mixture rules are deviation suboptimal , 2007, NIPS.

[20]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[21]  Guillaume Lecu 'e Suboptimality of Penalized Empirical Risk Minimization in Classification , 2007 .

[22]  A. Juditsky,et al.  Learning by mirror averaging , 2005, math/0511468.

[23]  Arnak S. Dalalyan,et al.  Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity , 2008, Machine Learning.

[24]  Jean-Yves Audibert Fast learning rates in statistical inference through aggregation , 2007, math/0703854.

[25]  Arnak S. Dalalyan,et al.  Sparse Regression Learning by Aggregation and Langevin Monte-Carlo , 2009, COLT.

[26]  Philippe Rigollet,et al.  Kullback-Leibler aggregation and misspecified generalized linear models , 2009, 0911.2919.

[27]  S. Mendelson,et al.  Aggregation via empirical risk minimization , 2009 .

[28]  A. Tsybakov,et al.  Exponential Screening and optimal rates of sparse estimation , 2010, 1003.2654.

[29]  A. Dalalyan,et al.  Sharp Oracle Inequalities for Aggregation of Affine Estimators , 2011, 1104.3969.

[30]  Karim Lounici,et al.  Pac-Bayesian Bounds for Sparse Regression Estimation with Exponential Weights , 2010, 1009.2707.

[31]  V. Koltchinskii,et al.  Oracle inequalities in empirical risk minimization and sparse recovery problems , 2011 .

[32]  S. Mendelson,et al.  Sharper lower bounds on the performance of the empirical risk minimization algorithm , 2011, 1102.4983.

[33]  Yu. I. Ingster,et al.  Statistical inference in compound functional models , 2012, 1208.6402.

[34]  A. Tsybakov,et al.  Sparse Estimation by Exponential Weighting , 2011, 1108.5116.

[35]  Tong Zhang,et al.  Deviation Optimal Learning using Greedy Q-aggregation , 2012, ArXiv.

[36]  Arnak S. Dalalyan,et al.  Mirror averaging with sparsity priors , 2010, 1003.1189.