Online Nonparametric Learning, Chaining, and the Role of Partial Feedback

We investigate contextual online learning with nonparametric (Lipschitz) comparison classes under different assumptions on losses and feedback information. For full information feedback and Lipschitz losses, we characterize the minimax regret up to log factors by proving an upper bound matching a previously known lower bound. In a partial feedback model motivated by second-price auctions, we prove upper bounds for Lipschitz and semi-Lipschitz losses that improve on the known bounds for standard bandit feedback. Our analysis combines novel results for contextual second-price auctions with a novel algorithmic approach based on chaining. When the context space is Euclidean, our chaining approach is efficient and delivers an even better regret bound.

[1]  Vladimir Vovk,et al.  Competing with wild prediction rules , 2005, Machine Learning.

[2]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[3]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[4]  Rémi Munos,et al.  Adaptive Bandits: Towards the best history-dependent strategy , 2011, AISTATS.

[5]  Noga Alon,et al.  Online Learning with Feedback Graphs: Beyond Bandits , 2015, COLT.

[6]  Michael Ostrovsky,et al.  Reserve Prices in Internet Advertising Auctions: A Field Experiment , 2009, Journal of Political Economy.

[7]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[8]  Nicolas Vayatis,et al.  Stochastic Process Bandits: Upper Confidence Bounds Algorithms via Generic Chaining , 2016, ArXiv.

[9]  Claudio Gentile,et al.  Regret Minimization for Reserve Prices in Second-Price Auctions , 2013, IEEE Transactions on Information Theory.

[10]  Pierre Gaillard,et al.  A Chaining Algorithm for Online Nonparametric Regression , 2015, COLT.

[11]  Jean-Yves Audibert,et al.  Regret Bounds and Minimax Policies under Partial Monitoring , 2010, J. Mach. Learn. Res..

[12]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[13]  Francesco Orabona,et al.  The ABACOC Algorithm: A Novel Approach for Nonparametric Classification of Data Streams , 2015, 2015 IEEE International Conference on Data Mining.

[14]  G. Lugosi,et al.  On Prediction of Individual Sequences , 1998 .

[15]  Noga Alon,et al.  Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..

[16]  Nimrod Megiddo,et al.  Online Learning with Prior Knowledge , 2007, COLT.

[17]  Akshay Krishnamurthy,et al.  Efficient Algorithms for Adversarial Contextual Learning , 2016, ICML.

[18]  N. Vayatis,et al.  Optimization for Gaussian Processes via Chaining , 2015, 1510.05576.

[19]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20]  Karthik Sridharan,et al.  Online Nonparametric Regression with General Loss Functions , 2015, ArXiv.

[21]  Karthik Sridharan,et al.  BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits , 2016, ICML.

[22]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[23]  Martin Pál,et al.  Contextual Multi-Armed Bandits , 2010, AISTATS.

[24]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[25]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[26]  John Langford,et al.  Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits , 2014, ICML.

[27]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[28]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.