论文信息 - Online convex optimization and no-regret learning: Algorithms, guarantees and applications

Online convex optimization and no-regret learning: Algorithms, guarantees and applications

Spurred by the enthusiasm surrounding the "Big Data" paradigm, the mathematical and algorithmic tools of online optimization have found widespread use in problems where the trade-off between data exploration and exploitation plays a predominant role. This trade-off is of particular importance to several branches and applications of signal processing, such as data mining, statistical inference, multimedia indexing and wireless communications (to name but a few). With this in mind, the aim of this tutorial paper is to provide a gentle introduction to online optimization and learning algorithms that are asymptotically optimal in hindsight - i.e., they approach the performance of a virtual algorithm with unlimited computational power and full knowledge of the future, a property known as no-regret. Particular attention is devoted to identifying the algorithms' theoretical performance guarantees and to establish links with classic optimization paradigms (both static and stochastic). To allow a better understanding of this toolbox, we provide several examples throughout the tutorial ranging from metric learning to wireless resource allocation problems.

[1] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[2] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[3] Marc Sebban,et al. A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[4] Inderjit S. Dhillon,et al. Online Metric Learning and Fast Similarity Search , 2008, NIPS.

[5] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[6] Jude W. Shavlik,et al. Mirror Descent for Metric Learning: A Unified Approach , 2012, ECML/PKDD.

[7] Luca Sanguinetti,et al. Distributed Stochastic Optimization via Matrix Exponential Learning , 2016, IEEE Transactions on Signal Processing.

[8] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[9] Elad Hazan,et al. Logarithmic regret algorithms for online convex optimization , 2006, Machine Learning.

[10] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[11] William H. Sandholm,et al. Learning in Games via Reinforcement and Regularization , 2014, Math. Oper. Res..

[12] David Haussler,et al. How to use expert advice , 1993, STOC.

[13] Ohad Shamir,et al. Matrix completion with the trace norm: learning, bounding, and transducing , 2014, J. Mach. Learn. Res..

[14] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[15] Daniel Pérez Palomar,et al. MIMO Cognitive Radio: A Game Theoretical Approach , 2008, IEEE Transactions on Signal Processing.

[16] Elad Hazan,et al. Adaptive Universal Linear Filtering , 2013, IEEE Transactions on Signal Processing.

[17] Guillermo Sapiro,et al. Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[18] Jean-Yves Audibert,et al. Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[19] James C. Spall,et al. A one-measurement form of simultaneous perturbation stochastic approximation , 1997, Autom..

[20] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .

[21] Ambuj Tewari,et al. Optimal Stragies and Minimax Lower Bounds for Online Convex Games , 2008, COLT.

[22] Ananthram Swami,et al. Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[23] David Picard,et al. Web-Scale Image Retrieval Using Compact Tensor Aggregation of Visual Descriptors , 2013, IEEE MultiMedia.

[24] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[25] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[26] Aris L. Moustakas,et al. Learning in an Uncertain World: MIMO Covariance Matrix Optimization With Imperfect Feedback , 2016, IEEE Transactions on Signal Processing.

[27] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[28] F. R. Rosendaal,et al. Prediction , 2015, Journal of thrombosis and haemostasis : JTH.

[29] Shai Shalev-Shwartz,et al. Online learning: theory, algorithms and applications (למידה מקוונת.) , 2007 .

[30] Vianney Perchet,et al. Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.

[31] Ness B. Shroff,et al. Efficient Beam Alignment in Millimeter Wave Systems Using Contextual Bandits , 2017, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[32] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[33] Stephen P. Boyd,et al. Real-Time Convex Optimization in Signal Processing , 2010, IEEE Signal Processing Magazine.

[34] Francisco Facchinei,et al. Convex Optimization, Game Theory, and Variational Inequality Theory , 2010, IEEE Signal Processing Magazine.

[35] Philip Wolfe,et al. Contributions to the theory of games , 1953 .

[36] Yoram Singer,et al. Online and batch learning of pseudo-metrics , 2004, ICML.

[37] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[38] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[39] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[40] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[41] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[42] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[43] Omar Besbes,et al. Non-Stationary Stochastic Optimization , 2013, Oper. Res..

[44] Panayotis Mertikopoulos,et al. A continuous-time approach to online optimization , 2014, Journal of Dynamics & Games.

[45] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[46] Thomas M. Cover,et al. Behavior of sequential predictors of binary sequences , 1965 .