Algorithmic Chaining and the Role of Partial Feedback in Online Nonparametric Learning

We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we design the first explicit algorithm achieving the minimax regret rate (up to log factors). In a partial feedback model motivated by second-price auctions, we obtain algorithms for Lipschitz and semi-Lipschitz losses with regret bounds improving on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound.

[1]  Ambuj Tewari,et al.  Online learning via sequential complexities , 2010, J. Mach. Learn. Res..

[2]  Francesco Orabona,et al.  The ABACOC Algorithm: A Novel Approach for Nonparametric Classification of Data Streams , 2015, 2015 IEEE International Conference on Data Mining.

[3]  G. Lugosi,et al.  On Prediction of Individual Sequences , 1998 .

[4]  Claudio Gentile,et al.  Regret Minimization for Reserve Prices in Second-Price Auctions , 2013, IEEE Transactions on Information Theory.

[5]  Nimrod Megiddo,et al.  Online Learning with Prior Knowledge , 2007, COLT.

[6]  Pierre Gaillard,et al.  A Chaining Algorithm for Online Nonparametric Regression , 2015, COLT.

[7]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[8]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[9]  Rémi Munos,et al.  Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.

[10]  Tor Lattimore,et al.  Refined Lower Bounds for Adversarial Bandits , 2016, NIPS.

[11]  Aleksandrs Slivkins,et al.  Contextual Bandits with Similarity Information , 2009, COLT.

[12]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[13]  Nicolas Vayatis,et al.  Stochastic Process Bandits: Upper Confidence Bounds Algorithms via Generic Chaining , 2016, ArXiv.

[14]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[15]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[16]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[17]  Michael Ostrovsky,et al.  Reserve Prices in Internet Advertising Auctions: A Field Experiment , 2009, Journal of Political Economy.

[18]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[19]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[20]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[21]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[22]  N. Vayatis,et al.  Optimization for Gaussian Processes via Chaining , 2015, 1510.05576.

[23]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[24]  Vladimir Vovk,et al.  Competing with wild prediction rules , 2005, Machine Learning.

[25]  Karthik Sridharan,et al.  Online Nonparametric Regression with General Loss Functions , 2015, ArXiv.

[26]  Noga Alon,et al.  Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..

[27]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[28]  Karthik Sridharan,et al.  BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits , 2016, ICML.

[29]  Akshay Krishnamurthy,et al.  Efficient Algorithms for Adversarial Contextual Learning , 2016, ICML.

[30]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[31]  Martin Pál,et al.  Contextual Multi-Armed Bandits , 2010, AISTATS.

[32]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..