Covariance-adapting algorithm for semi-bandits with application to sparse outcomes

We investigate stochastic combinatorial semi-bandits, where the entire joint distribution of outcomes impacts the complexity of the problem instance (unlike in the standard bandits). Typical distributions considered depend on specific parameter values, whose prior knowledge is required in theory but quite difficult to estimate in practice; an example is the commonly assumed sub-Gaussian family. We alleviate this issue by instead considering a new general family of sub-exponential distributions, which contains bounded and Gaussian ones. We prove a new lower bound on the regret on this family, that is parameterized by the unknown covariance matrix, a tighter quantity than the sub-Gaussian matrix. We then construct an algorithm that uses covariance estimates, and provide a tight asymptotic analysis of the regret. Finally, we apply and extend our results to the family of sparse outcomes, which has applications in many recommender systems.

[1]  V. V. Buldygin,et al.  The sub-Gaussian norm of a binary random variable , 2013 .

[2]  Aurélien Garivier,et al.  The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[3]  Alexandre Proutière,et al.  Combinatorial Bandits Revisited , 2015, NIPS.

[4]  Vianney Perchet,et al.  Combinatorial semi-bandit with known covariance , 2016, NIPS.

[5]  T. L. Graves,et al.  Asymptotically Efficient Adaptive Choice of Control Laws inControlled Markov Chains , 1997 .

[6]  Vianney Perchet,et al.  Sparse Stochastic Bandits , 2017, COLT.

[7]  Wei Chen,et al.  Improving Regret Bounds for Combinatorial Semi-Bandits with Probabilistically Triggered Arms and Its Applications , 2017, NIPS.

[8]  Alper Atamtürk,et al.  Maximizing a Class of Utility Functions Over the Vertices of a Polytope , 2017, Oper. Res..

[9]  Sébastien Gerchinovitz,et al.  Sparsity Regret Bounds for Individual Sequences in Online Linear Regression , 2011, COLT.

[10]  Vianney Perchet,et al.  Finding the Bandit in a Graph: Sequential Search-and-Stop , 2018, AISTATS.

[11]  Yajun Wang,et al.  Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms , 2014, J. Mach. Learn. Res..

[12]  Zengyou He,et al.  Mining top-k strongly correlated item pairs without minimum correlation threshold , 2006, Int. J. Knowl. Based Intell. Eng. Syst..

[13]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[14]  Nicolò Cesa-Bianchi,et al.  Combinatorial Bandits , 2012, COLT.

[15]  Tamás Linder,et al.  The On-Line Shortest Path Problem Under Partial Monitoring , 2007, J. Mach. Learn. Res..

[16]  Vianney Perchet,et al.  Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case , 2015, J. Mach. Learn. Res..

[17]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[18]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[19]  Alexandre Proutière,et al.  Lipschitz Bandits: Regret Lower Bound and Optimal Algorithms , 2014, COLT.

[20]  Shie Mannor,et al.  Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem , 2019, COLT.

[21]  A. Burnetas,et al.  Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[22]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[23]  Csaba Szepesvári,et al.  Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits , 2012, AISTATS.

[24]  Joan Feigenbaum,et al.  Finding highly correlated pairs efficiently with powerful pruning , 2006, CIKM '06.

[25]  Wei Chen,et al.  Tighter Regret Bounds for Influence Maximization and Other Combinatorial Semi-Bandits with Probabilistically Triggered Arms , 2017, ArXiv.

[26]  Rémi Munos,et al.  Spectral Bandits for Smooth Graph Functions , 2014, ICML.

[27]  Wtt Wtt Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits , 2015 .

[28]  Vianney Perchet,et al.  Exploiting structure of uncertainty for efficient matroid semi-bandits , 2019, ICML.

[29]  Vianney Perchet,et al.  Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits , 2020, NeurIPS.

[30]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[31]  Wei Chen,et al.  Thompson Sampling for Combinatorial Semi-Bandits , 2018, ICML.

[32]  Richard Combes,et al.  Stochastic Online Shortest Path Routing: The Value of Feedback , 2013, IEEE Transactions on Automatic Control.

[33]  Rémi Munos,et al.  Bandit Theory meets Compressed Sensing for high dimensional Stochastic Linear Bandit , 2012, AISTATS.

[34]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[35]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..