Adaptive scale-invariant online algorithms for learning linear models

We consider online learning with linear models, where the algorithm predicts on sequentially revealed instances (feature vectors), and is compared against the best linear function (comparator) in hindsight. Popular algorithms in this framework, such as Online Gradient Descent (OGD), have parameters (learning rates), which ideally should be tuned based on the scales of the features and the optimal comparator, but these quantities only become available at the end of the learning process. In this paper, we resolve the tuning problem by proposing online algorithms making predictions which are invariant under arbitrary rescaling of the features. The algorithms have no parameters to tune, do not require any prior knowledge on the scale of the instances or the comparator, and achieve regret bounds matching (up to a logarithmic factor) that of OGD with optimally tuned separate learning rates per dimension, while retaining comparable runtime performance.

[1]  Ashok Cutkosky,et al.  Online Learning Without Prior Information , 2017, COLT.

[2]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3]  John Langford,et al.  Normalized Online Learning , 2013, UAI.

[4]  Philip M. Long,et al.  On-line learning of linear functions , 1991, STOC '91.

[5]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[6]  Haipeng Luo,et al.  Efficient Second Order Online Learning by Sketching , 2016, NIPS.

[7]  Francesco Orabona,et al.  Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning , 2014, NIPS.

[8]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[9]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[10]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[11]  Manfred K. Warmuth,et al.  Relative Loss Bounds for On-Line Density Estimation with the Exponential Family of Distributions , 1999, Machine Learning.

[12]  H. Brendan McMahan,et al.  Minimax Optimal Algorithms for Unconstrained Linear Optimization , 2013, NIPS.

[13]  Tatiana Tommasi,et al.  Training Deep Networks without Learning Rates Through Coin Betting , 2017, NIPS.

[14]  Francesco Orabona,et al.  Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.

[15]  Francesco Orabona,et al.  Black-Box Reductions for Parameter-free Online Learning in Banach Spaces , 2018, COLT.

[16]  Matthew J. Streeter,et al.  No-Regret Algorithms for Unconstrained Online Convex Optimization , 2012, NIPS.

[17]  Francesco Orabona,et al.  Dimension-Free Exponentiated Gradient , 2013, NIPS.

[18]  Francesco Orabona,et al.  Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations , 2014, COLT.

[19]  Wojciech Kotlowski,et al.  Scale-Invariant Unconstrained Online Learning , 2017, ALT.

[20]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[23]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[24]  Roi Livni,et al.  Affine-Invariant Online Optimization and the Low-rank Experts Problem , 2017, NIPS.

[25]  Koby Crammer,et al.  A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.

[26]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[27]  V. Vovk Competitive On‐line Statistics , 2001 .

[28]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.