Parameter-Free Online Learning via Model Selection

We introduce an efficient algorithmic framework for model selection in online learning, also known as parameter-free online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic meta-algorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameter-free algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, non-nested convex sets, and $\mathbb{R}^{d}$ with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary non-linear classes in the online supervised learning model. These results are all derived through a unified meta-algorithm scheme using a novel "multi-scale" algorithm for prediction with expert advice based on random playout, which may be of independent interest.

[1]  Francesco Orabona,et al.  From Coin Betting to Parameter-Free Online Learning , 2016, ArXiv.

[2]  Mark D. Reid,et al.  Fast rates in statistical and online learning , 2015, J. Mach. Learn. Res..

[3]  John Shawe-Taylor,et al.  Structural Risk Minimization Over Data-Dependent Hierarchies , 1998, IEEE Trans. Inf. Theory.

[4]  Vladimir Koltchinskii,et al.  Rademacher penalties and structural risk minimization , 2001, IEEE Trans. Inf. Theory.

[5]  Haipeng Luo,et al.  Achieving All with No Parameters: AdaNormalHedge , 2015, COLT.

[6]  Ashok Cutkosky,et al.  Online Convex Optimization with Unconstrained Domains and Losses , 2017, NIPS.

[7]  Peter L. Bartlett,et al.  Model Selection and Error Estimation , 2000, Machine Learning.

[8]  Jia Yuan Yu,et al.  Adaptive and Optimal Online Linear Regression on ℓ1-Balls , 2011, ALT.

[9]  Ashok Cutkosky,et al.  Online Learning Without Prior Information , 2017, COLT.

[10]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[11]  James Renegar,et al.  A polynomial-time algorithm, based on Newton's method, for linear programming , 1988, Math. Program..

[12]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[13]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[14]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[15]  Matthew J. Streeter,et al.  No-Regret Algorithms for Unconstrained Online Convex Optimization , 2012, NIPS.

[16]  Ohad Shamir,et al.  Relax and Randomize : From Value to Algorithms , 2012, NIPS.

[17]  Nikhil R. Devanur,et al.  Online Auctions and Multi-scale Online Learning , 2017, EC.

[18]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[19]  Francesco Orabona,et al.  Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning , 2014, NIPS.

[20]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[21]  Wouter M. Koolen,et al.  Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..

[22]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[23]  Yoav Freund,et al.  A Parameter-free Hedging Algorithm , 2009, NIPS.

[24]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[25]  Shai Shalev-Shwartz,et al.  Near-Optimal Algorithms for Online Matrix Prediction , 2012, COLT.

[26]  Ambuj Tewari,et al.  Online Learning: Random Averages, Combinatorial Parameters, and Learnability , 2010, NIPS.

[27]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[28]  Jiazhong Nie,et al.  Online PCA with Optimal Regrets , 2013, ALT.

[29]  H. Brendan McMahan,et al.  Minimax Optimal Algorithms for Unconstrained Linear Optimization , 2013, NIPS.

[30]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[31]  Karthik Sridharan,et al.  Adaptive Online Learning , 2015, NIPS.

[32]  Elad Hazan,et al.  Logistic Regression: Tight Bounds for Stochastic and Online Optimization , 2014, COLT.

[33]  Francesco Orabona,et al.  Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations , 2014, COLT.

[34]  V. Vapnik Estimation of Dependences Based on Empirical Data , 2006 .

[35]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[36]  K. Ball,et al.  Sharp uniform convexity and smoothness inequalities for trace norms , 1994 .

[37]  Francesco Orabona,et al.  Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[38]  Shai Ben-David,et al.  Agnostic Online Learning , 2009, COLT.

[39]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[40]  Wouter M. Koolen,et al.  Second-order Quantile Methods for Experts and Combinatorial Games , 2015, COLT.

[41]  Ambuj Tewari,et al.  On the Universality of Online Mirror Descent , 2011, NIPS.