On Submodular Contextual Bandits

We consider the problem of contextual bandits where actions are subsets of a ground set and mean rewards are modeled by an unknown monotone submodular function that belongs to a class F. We allow time-varying matroid constraints to be placed on the feasible sets. Assuming access to an online regression oracle with regret Regsq(F, n), our algorithm efficiently randomizes around local optima of estimated functions according to the Inverse Gap Weighting strategy [AL99, FR20]. We show that cumulative regret of this procedure with time horizon n scales as O( √ nRegsq(F, n)) against a benchmark with a multiplicative factor 1/2. On the other hand, using the techniques of [FW14] we show that an ǫ-Greedy procedure with local randomization attains regret of O(nRegsq(F, n) ) against a stronger (1− e) benchmark.

[1]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[2]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[3]  Philip M. Long,et al.  Associative Reinforcement Learning using Linear Probabilistic Concepts , 1999, ICML.

[4]  Thorsten Joachims,et al.  Predicting diverse subsets using structural SVMs , 2008, ICML '08.

[5]  Alexander Rakhlin,et al.  Beyond UCB: Optimal and Efficient Contextual Bandits with Regression Oracles , 2020, ICML.

[6]  Yisong Yue,et al.  Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[7]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[8]  Ambuj Tewari,et al.  Online learning via sequential complexities , 2010, J. Mach. Learn. Res..

[9]  Karthik Sridharan,et al.  Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints , 2015, COLT.

[10]  Rad Niazadeh,et al.  Online Learning via Offline Greedy Algorithms: Applications in Market Design and Optimization , 2020, EC.

[11]  Yuval Filmus,et al.  Monotone Submodular Maximization over a Matroid via Non-Oblivious Local Search , 2012, SIAM J. Comput..

[12]  Raghu Meka,et al.  Learning One Convolutional Layer with Overlapping Patches , 2018, ICML.

[13]  Ali Shameli,et al.  Ranking an Assortment of Products Via Sequential Submodular Optimization , 2020, ArXiv.

[14]  Jan Vondrák,et al.  Optimal approximation for submodular and supermodular optimization with bounded curvature , 2013, SODA.

[15]  Andreas Krause,et al.  Cost-effective outbreak detection in networks , 2007, KDD '07.

[16]  Hui Lin,et al.  Learning Mixtures of Submodular Shells with Application to Document Summarization , 2012, UAI.

[17]  Vahab S. Mirrokni,et al.  Non-monotone submodular maximization under matroid and knapsack constraints , 2009, STOC '09.

[18]  Lexing Ying,et al.  Top-k eXtreme Contextual Bandits with Arm Hierarchy , 2021, ICML.

[19]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[20]  Adam Tauman Kalai,et al.  The Isotron Algorithm: High-Dimensional Isotonic Regression , 2009, COLT.

[21]  Andreas Krause,et al.  Near-optimal Observation Selection using Submodular Functions , 2007, AAAI.

[22]  Andreas Krause,et al.  Submodular Function Maximization , 2014, Tractability.

[23]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[24]  Amin Karbasi,et al.  Adaptivity in Adaptive Submodularity , 2019, COLT.

[25]  Shuai Li,et al.  Online Learning to Rank with Features , 2018, ICML.

[26]  Karthik Sridharan,et al.  Online Non-Parametric Regression , 2014, COLT.

[27]  Adam Tauman Kalai,et al.  Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.

[28]  Zheng Wen,et al.  Adaptive Submodular Maximization in Bandit Setting , 2013, NIPS.

[29]  Zheng Wen,et al.  Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[30]  Andreas Krause,et al.  Interactive Submodular Bandit , 2017, NIPS.