论文信息 - Scale-free online learning - 字舞流文

Scale-free online learning

We design and analyze algorithms for online linear optimization that have optimal regret and at the same time do not need to know any upper or lower bounds on the norm of the loss vectors. Our algorithms are instances of the Follow the Regularized Leader (FTRL) and Mirror Descent (MD) meta-algorithms. We achieve adaptiveness to the norms of the loss vectors by scale invariance, i.e., our algorithms make exactly the same decisions if the sequence of loss vectors is multiplied by any positive constant. The algorithm based on FTRL works for any decision set, bounded or unbounded. For unbounded decisions sets, this is the first adaptive algorithm for online linear optimization with a non-vacuous regret bound. In contrast, we show lower bounds on scale-free algorithms based on MD on unbounded domains.

Francesco Orabona | Dávid Pál | D. Pál | Francesco Orabona

[1] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[2] Francesco Orabona,et al. Unconstrained Online Linear Learning in Hilbert Spaces: Minimax Algorithms and Normal Approximations , 2014, COLT.

[3] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[4] H. Brendan McMahan,et al. A survey of Algorithms and Analysis for Adaptive Online Learning , 2014, J. Mach. Learn. Res..

[5] F ROSENBLATT,et al. The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[6] H. Brendan McMahan,et al. Analysis Techniques for Adaptive Online Learning , 2014, ArXiv.

[7] Matthew J. Streeter,et al. Adaptive Bound Optimization for Online Convex Optimization , 2010, COLT 2010.

[8] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[9] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..

[10] Panayotis Mertikopoulos,et al. A continuous-time approach to online optimization , 2014, Journal of Dynamics & Games.

[11] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[12] R. Steele. Optimization , 2005 .

[13] Ambuj Tewari,et al. Composite objective mirror descent , 2010, COLT 2010.

[14] Alexander Rakhlin,et al. Lecture Notes on Online Learning DRAFT , 2009 .

[15] Vladimir Vovk,et al. A game of prediction with expert advice , 1995, COLT '95.

[16] Francesco Orabona,et al. Coin Betting and Parameter-Free Online Learning , 2016, NIPS.

[17] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[18] Sébastien Bubeck,et al. Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[19] Matthew J. Streeter,et al. Less Regret via Online Conditioning , 2010, ArXiv.

[20] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[21] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[22] P. Bartlett,et al. Optimal strategies and minimax lower bounds for online convex games [Technical Report No. UCB/EECS-2008-19] , 2008 .

[23] Yoav Freund,et al. Large Margin Classification Using the Perceptron Algorithm , 1998, COLT.

[24] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[25] Mingjun Zhong,et al. Advances in Neural Information Processing Systems 28 (NIPS 2015) , 2014 .

[26] H. Brendan McMahan,et al. Minimax Optimal Algorithms for Unconstrained Linear Optimization , 2013, NIPS.

[27] John Langford,et al. Normalized Online Learning , 2013, UAI.

[28] Shai Shalev-Shwartz,et al. Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[29] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[30] Claudio Gentile,et al. Adaptive and Self-Confident On-Line Learning Algorithms , 2000, J. Comput. Syst. Sci..

[31] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[32] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[33] Martin Wattenberg,et al. Ad click prediction: a view from the trenches , 2013, KDD.

[34] Wouter M. Koolen,et al. Follow the leader if you can, hedge if you must , 2013, J. Mach. Learn. Res..

[35] F. R. Rosendaal,et al. Prediction , 2015, Journal of thrombosis and haemostasis : JTH.

[36] Ashok Cutkosky,et al. Online Convex Optimization with Unconstrained Domains and Losses , 2017, NIPS.

[37] Marc Teboulle,et al. Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[38] Manfred K. Warmuth,et al. Learning Permutations with Exponential Weights , 2007, COLT.

[39] Manfred K. Warmuth,et al. The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[40] Francesco Orabona,et al. Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning , 2014, NIPS.

[41] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[42] Koby Crammer,et al. A generalized online mirror descent with applications to classification and regression , 2013, Machine Learning.

[43] Yoram Singer,et al. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[44] Ambuj Tewari,et al. On the Universality of Online Mirror Descent , 2011, NIPS.

[45] David Haussler,et al. How to use expert advice , 1993, STOC.

[46] Francesco Orabona,et al. Dimension-Free Exponentiated Gradient , 2013, NIPS.

[47] Francesco Orabona,et al. Scale-Free Algorithms for Online Linear Optimization , 2015, ALT.

[48] Matthew J. Streeter,et al. No-Regret Algorithms for Unconstrained Online Convex Optimization , 2012, NIPS.