Efficient improper learning for online logistic regression

We consider the setting of online logistic regression and consider the regret with respect to the 2-ball of radius B. It is known (see [Hazan et al., 2014]) that any proper algorithm which has logarithmic regret in the number of samples (denoted n) necessarily suffers an exponential multiplicative constant in B. In this work, we design an efficient improper algorithm that avoids this exponential constant while preserving a logarithmic regret. Indeed, [Foster et al., 2018] showed that the lower bound does not apply to improper algorithms and proposed a strategy based on exponential weights with prohibitive computational complexity. Our new algorithm based on regularized empirical risk minimization with surrogate losses satisfies a regret scaling as O(B log(Bn)) with a per-round time-complexity of order O(d^2).

[1]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[2]  Elad Hazan,et al.  Logistic Regression: Tight Bounds for Stochastic and Online Optimization , 2014, COLT.

[3]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[4]  Haipeng Luo,et al.  Logistic Regression: The Importance of Being Improper , 2018, COLT.

[5]  Stéphane Gaïffas,et al.  An improper estimator with optimal excess risk in misspecified density estimation and logistic regression , 2019, ArXiv.

[6]  Sébastien Bubeck,et al.  Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo , 2015, Discrete & Computational Geometry.

[7]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[8]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[9]  J. Berkson Application of the Logistic Function to Bio-Assay , 1944 .

[10]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[12]  Gilles Stoltz,et al.  Uniform regret bounds over Rd for the sequential linear regression problem with the square loss , 2018, ALT.

[13]  Daniele Calandriello,et al.  Efficient Second-Order Online Kernel Learning with Adaptive Embedding , 2017, NIPS.

[14]  Alessandro Rudi,et al.  Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance , 2019, COLT.

[15]  Nishant A. Mehta From exp-concavity to variance control: High probability O(1/n) rates and high probability online-to-batch conversion , 2016 .

[16]  H. Brendan McMahan,et al.  Follow-the-Regularized-Leader and Mirror Descent: Equivalence Theorems and L1 Regularization , 2011, AISTATS.

[17]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[18]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[19]  Alessandro Rudi,et al.  Efficient online learning with kernels for adversarial large scale problems , 2019, NeurIPS.

[20]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[21]  Nishant Mehta,et al.  Fast rates with high probability in exp-concave statistical learning , 2016, AISTATS.

[22]  V. Vovk Competitive On‐line Statistics , 2001 .