Mixability made efficient: Fast online multiclass logistic regression

Mixability has been shown to be a powerful tool to obtain algorithms with optimal regret. However, the resulting methods often suffer from high computational complexity which has reduced their practical applicability. For example, in the case of multiclass logistic regression, the aggregating forecaster (Foster et al. (2018)) achieves a regret of O(log(Bn)) whereas Online Newton Step achieves O(e log(n)) obtaining a double exponential gain in B (a bound on the norm of comparative functions). However, this high statistical performance is at the price of a prohibitive computational complexity O(n). In this paper, we use quadratic surrogates to make aggregating forecasters more efficient. We show that the resulting algorithm has still high statistical performance for a large class of losses. In particular, we derive an algorithm for multi-class logistic regression with a regret bounded by O(B log(n)) and a computational complexity of only O(n).

[1]  Sham M. Kakade,et al.  Online Bounds for Bayesian Algorithms , 2004, NIPS.

[2]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[3]  Alessandro Rudi,et al.  Efficient online learning with kernels for adversarial large scale problems , 2019, NeurIPS.

[4]  Mark D. Reid,et al.  Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..

[5]  Alessandro Rudi,et al.  Beyond Least-Squares: Fast Rates for Regularized Empirical Risk Minimization through Self-Concordance , 2019, COLT.

[6]  Jaouad Mourtada,et al.  An improper estimator with optimal excess risk in misspecified density estimation and logistic regression , 2019, J. Mach. Learn. Res..

[7]  Dmitrii Ostrovskii,et al.  Finite-sample Analysis of M-estimators using Self-concordance , 2018, 1810.06838.

[8]  Francis R. Bach,et al.  Self-concordant analysis for logistic regression , 2009, ArXiv.

[9]  Haipeng Luo,et al.  Logistic Regression: The Importance of Being Improper , 2018, COLT.

[10]  Manfred K. Warmuth,et al.  On Weak Learning , 1995, J. Comput. Syst. Sci..

[11]  Elad Hazan,et al.  Logistic Regression: Tight Bounds for Stochastic and Online Optimization , 2014, COLT.

[12]  Sébastien Bubeck,et al.  Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo , 2015, Discrete & Computational Geometry.

[13]  Benjamin Doerr,et al.  Probabilistic Tools for the Analysis of Randomized Optimization Heuristics , 2018, Theory of Evolutionary Computation.

[14]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[15]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[16]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[17]  P. Gaillard,et al.  Efficient improper learning for online logistic regression , 2020, COLT.

[18]  V. Vovk Competitive On‐line Statistics , 2001 .

[19]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .