Online Newton Step Algorithm with Estimated Gradient

Online learning with limited information feedback (bandit) tries to solve the problem where an online learner receives partial feedback information from the environment in the course of learning. Under this setting, Flaxman et al.[8] extended Zinkevich's classical Online Gradient Descent (OGD) algorithm [29] by proposing the Online Gradient Descent with Expected Gradient (OGDEG) algorithm. Specifically, it uses a simple trick to approximate the gradient of the loss function $f_t$ by evaluating it at a single point and bounds the expected regret as $\mathcal{O}(T^{5/6})$ [8], where the number of rounds is $T$. Meanwhile, past research efforts have shown that compared with the first-order algorithms, second-order online learning algorithms such as Online Newton Step (ONS) [11] can significantly accelerate the convergence rate of traditional online learning algorithms. Motivated by this, this paper aims to exploit the second-order information to speed up the convergence of the OGDEG algorithm. In particular, we extend the ONS algorithm with the trick of expected gradient and develop a novel second-order online learning algorithm, i.e., Online Newton Step with Expected Gradient (ONSEG). Theoretically, we show that the proposed ONSEG algorithm significantly reduces the expected regret of OGDEG algorithm from $\mathcal{O}(T^{5/6})$ to $\mathcal{O}(T^{2/3})$ in the bandit feedback scenario. Empirically, we further demonstrate the advantages of the proposed algorithm on multiple real-world datasets.

[1]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[2]  Sanjeev Arora,et al.  Efficient algorithms for online convex optimization and their applications , 2006 .

[3]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[4]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[5]  Patrick Thiran,et al.  Stochastic Optimization with Bandit Sampling , 2017, ArXiv.

[6]  Steven C. H. Hoi,et al.  Online Learning: A Comprehensive Survey , 2018, Neurocomputing.

[7]  Santosh S. Vempala,et al.  Simulated annealing in convex bodies and an O*(n4) volume algorithm , 2006, J. Comput. Syst. Sci..

[8]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[9]  Elad Hazan,et al.  Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[10]  Mehryar Mohri,et al.  Optimistic Bandit Convex Optimization , 2016, NIPS.

[11]  Gergely Neu,et al.  An Efficient Algorithm for Learning with Semi-bandit Feedback , 2013, ALT.

[12]  Ryoji Kataoka,et al.  Robust Online Learning to Rank via Selective Pairwise Approach Based on Evaluation Measures , 2013 .

[13]  Tjalling J. Ypma,et al.  Historical Development of the Newton-Raphson Method , 1995, SIAM Rev..

[14]  Santosh S. Vempala,et al.  Simulated annealing in convex bodies and an O*(n/sup 4/) volume algorithm , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[15]  Elad Hazan,et al.  Interior-Point Methods for Full-Information and Bandit Online Learning , 2012, IEEE Transactions on Information Theory.

[16]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[17]  Csaba Szepesvári,et al.  Online Learning to Rank in Stochastic Click Models , 2017, ICML.

[18]  Ambuj Tewari,et al.  Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[19]  Miklós Simonovits,et al.  Random walks and an O*(n5) volume algorithm for convex bodies , 1997, Random Struct. Algorithms.

[20]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[21]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[22]  Mingyan Liu,et al.  Online algorithms for the multi-armed bandit problem with Markovian rewards , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[24]  Yi Ding,et al.  Large Scale Kernel Methods for Online AUC Maximization , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[25]  Shai Shalev-Shwartz,et al.  On Graduated Optimization for Stochastic Non-Convex Problems , 2015, ICML.

[26]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[27]  V. Milman,et al.  Isotropic position and inertia ellipsoids and zonoids of the unit ball of a normed n-dimensional space , 1989 .

[28]  Steven C. H. Hoi,et al.  LIBOL: a library for online learning algorithms , 2014, J. Mach. Learn. Res..

[29]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[30]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[31]  Yuanzhi Li,et al.  An optimal algorithm for bandit convex optimization , 2016, ArXiv.

[32]  Avrim Blum,et al.  Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[33]  Geoffrey J. Gordon Regret bounds for prediction problems , 1999, COLT '99.