Online convex optimization and no-regret learning: Algorithms, guarantees and applications

Spurred by the enthusiasm surrounding the "Big Data" paradigm, the mathematical and algorithmic tools of online optimization have found widespread use in problems where the trade-off between data exploration and exploitation plays a predominant role. This trade-off is of particular importance to several branches and applications of signal processing, such as data mining, statistical inference, multimedia indexing and wireless communications (to name but a few). With this in mind, the aim of this tutorial paper is to provide a gentle introduction to online optimization and learning algorithms that are asymptotically optimal in hindsight - i.e., they approach the performance of a virtual algorithm with unlimited computational power and full knowledge of the future, a property known as no-regret. Particular attention is devoted to identifying the algorithms' theoretical performance guarantees and to establish links with classic optimization paradigms (both static and stochastic). To allow a better understanding of this toolbox, we provide several examples throughout the tutorial ranging from metric learning to wireless resource allocation problems.

[1]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[2]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[3]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[4]  Inderjit S. Dhillon,et al.  Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[5]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6]  Jude W. Shavlik,et al.  Mirror Descent for Metric Learning: A Unified Approach , 2012, ECML/PKDD.

[7]  Luca Sanguinetti,et al.  Distributed Stochastic Optimization via Matrix Exponential Learning , 2016, IEEE Transactions on Signal Processing.

[8]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[9]  Elad Hazan,et al.  Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  William H. Sandholm,et al.  Learning in Games via Reinforcement and Regularization , 2014, Math. Oper. Res..

[12]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[13]  Ohad Shamir,et al.  Matrix completion with the trace norm: learning, bounding, and transducing , 2014, J. Mach. Learn. Res..

[14]  R. Agrawal Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[15]  Daniel Pérez Palomar,et al.  MIMO Cognitive Radio: A Game Theoretical Approach , 2008, IEEE Transactions on Signal Processing.

[16]  Elad Hazan,et al.  Adaptive Universal Linear Filtering , 2013, IEEE Transactions on Signal Processing.

[17]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[18]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[19]  James C. Spall,et al.  A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[20]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[21]  Ambuj Tewari,et al.  Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[22]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[23]  David Picard,et al.  Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors , 2013, IEEE MultiMedia.

[24]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[25]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26]  Aris L. Moustakas,et al.  Learning in an Uncertain World: MIMO Covariance Matrix Optimization With Imperfect Feedback , 2016, IEEE Transactions on Signal Processing.

[27]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[28]  F. R. Rosendaal,et al.  Prediction , 2015, Journal of thrombosis and haemostasis : JTH.

[29]  Shai Shalev-Shwartz,et al.  Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[30]  Vianney Perchet,et al.  Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.

[31]  Ness B. Shroff,et al.  Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits , 2017, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[32]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[33]  Stephen P. Boyd,et al.  Real-Time Convex Optimization in Signal Processing , 2010, IEEE Signal Processing Magazine.

[34]  Francisco Facchinei,et al.  Convex Optimization, Game Theory, and Variational Inequality Theory , 2010, IEEE Signal Processing Magazine.

[35]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[36]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[37]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[38]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[39]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[40]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[41]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[42]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[43]  Omar Besbes,et al.  Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[44]  Panayotis Mertikopoulos,et al.  A continuous-time approach to online optimization , 2014, Journal of Dynamics & Games.

[45]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[46]  Thomas M. Cover,et al.  Behavior of sequential predictors of binary sequences , 1965 .