Invariant Lipschitz Bandits: A Side Observation Approach

Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.

[1]  Taco Cohen,et al.  A PAC-Bayesian Generalization Bound for Equivariant Networks , 2022, NeurIPS.

[2]  Prateek Jain,et al.  Online Low Rank Matrix Completion , 2022, ICLR.

[3]  Bryn Elesedy,et al.  Provably Strict Generalisation Benefit for Equivariant Models , 2021, ICML.

[4]  Csaba Szepesvari,et al.  Bandit Algorithms , 2020 .

[5]  D. Needell,et al.  Online matrix factorization for Markovian data and applications to Network Dictionary Learning , 2019, ArXiv.

[6]  Akiyoshi Sannai,et al.  Improved generalization bounds of group invariant / equivariant deep networks via quotient feature spaces , 2019, UAI.

[7]  Thodoris Lykouris,et al.  Graph regret bounds for Thompson Sampling and UCB , 2019, ALT.

[8]  Aleksandrs Slivkins,et al.  Introduction to Multi-Armed Bandits , 2019, Found. Trends Mach. Learn..

[9]  Larisa Shwartz,et al.  Online Interactive Collaborative Filtering Using Multi-Armed Bandit with Dependent Arms , 2017, IEEE Transactions on Knowledge and Data Engineering.

[10]  Huazheng Wang,et al.  Factorization Bandits for Interactive Recommendation , 2017, AAAI.

[11]  Junwei Lu,et al.  Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).

[12]  Alexandros G. Dimakis,et al.  Contextual Bandits with Latent Confounders: An NMF Approach , 2016, AISTATS.

[13]  Tomer Koren,et al.  Online Learning with Feedback Graphs Without the Graphs , 2016, ICML.

[14]  Long Tran-Thanh,et al.  Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[15]  Noga Alon,et al.  Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback , 2014, SIAM J. Comput..

[16]  Eli Upfal,et al.  Bandits and Experts in Metric Spaces , 2013, J. ACM.

[17]  Noga Alon,et al.  From Bandits to Experts: A Tale of Domination and Independence , 2013, NIPS.

[18]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[19]  Marc Lelarge,et al.  Leveraging Side Observations in Stochastic Bandits , 2012, UAI.

[20]  Toby Walsh,et al.  Symmetry Breaking Constraints: Recent Results , 2012, AAAI.

[21]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[22]  Shie Mannor,et al.  From Bandits to Experts: On the Value of Side-Observations , 2011, NIPS.

[23]  Jia Yuan Yu,et al.  Lipschitz Bandits without the Lipschitz Constant , 2011, ALT.

[24]  Csaba Szepesvari,et al.  X-Armed Bandits , 2010, J. Mach. Learn. Res..

[25]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[26]  Jeff T. Linderoth,et al.  Orbital branching , 2007, Math. Program..

[27]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[28]  Toby Walsh,et al.  The Complexity of Reasoning with Global Constraints , 2007, Constraints.

[29]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[30]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[31]  François Margot,et al.  Pruning by isomorphism in branch-and-cut , 2001, Math. Program..

[32]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[33]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[34]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[35]  J. Ratcliffe Foundations of Hyperbolic Manifolds , 2019, Graduate Texts in Mathematics.