论文信息 - Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward - 字舞流文

Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward

In this paper, we study the stochastic contextual combinatorial multi-armed bandit (CC-MAB) framework that is tailored for volatile arms and submodular reward functions. CC-MAB inherits properties from both contextual bandit and combinatorial bandit: it aims to select a set of arms in each round based on the side information (a.k.a. context) associated with the arms. By ``volatile arms'', we mean that the available arms to select from in each round may change; and by ``submodular rewards'', we mean that the total reward achieved by selected arms is not a simple sum of individual rewards but demonstrates a feature of diminishing returns determined by the relations between selected arms (e.g. relevance and redundancy). Volatile arms and submodular rewards are often seen in many real-world applications, e.g. recommender systems and crowdsourcing, in which multi-armed bandit (MAB) based strategies are extensively applied. Although there exist works that investigate these issues separately based on standard MAB, jointly considering all these issues in a single MAB problem requires very different algorithm design and regret analysis. Our algorithm CC-MAB provides an online decision-making policy in a contextual and combinatorial bandit setting and effectively addresses the issues raised by volatile arms and submodular reward functions. The proposed algorithm is proved to achieve $O(cT^{\frac{2\alpha+D}{3\alpha + D}}\log(T))$ regret after a span of $T$ rounds. The performance of CC-MAB is evaluated by experiments conducted on a real-world crowdsourcing dataset, and the result shows that our algorithm outperforms the prior art.

Jie Xu | Lixing Chen | Zhuo Lu | J. Xu | Zhuo Lu | Lixing Chen

[1] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[2] Andreas Krause,et al. Information Gathering with Peers: Submodular Optimization with Peer-Prediction Constraints , 2018, AAAI.

[3] Wei Chu,et al. A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[4] Ning Zhang,et al. Identifying the Most Valuable Workers in Fog-Assisted Spatial Crowdsourcing , 2017, IEEE Internet of Things Journal.

[5] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[6] Robert D. Kleinberg,et al. Regret bounds for sleeping experts and bandits , 2010, Machine Learning.

[7] Elad Hazan,et al. Online submodular minimization , 2009, J. Mach. Learn. Res..

[8] Filip Radlinski,et al. Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[9] Andreas Krause,et al. Interactive Submodular Bandit , 2017, NIPS.

[10] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[11] Rami Puzis,et al. Volatile Multi-Armed Bandits for Guaranteed Targeted Social Crawling , 2013, AAAI.

[12] John Langford,et al. The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information , 2007, NIPS.

[13] Xiaoyan Zhu,et al. Contextual Combinatorial Bandit and its Application on Diversified Online Recommendation , 2014, SDM.

[14] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[15] Bhaskar Krishnamachari,et al. Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations , 2010, IEEE/ACM Transactions on Networking.

[16] Yisong Yue,et al. Linear Submodular Bandits and their Application to Diversified Retrieval , 2011, NIPS.

[17] Mihaela van der Schaar,et al. Information Production and Link Formation in Social Computing Systems , 2012, IEEE Journal on Selected Areas in Communications.

[18] Wei Chen,et al. Combinatorial Multi-Armed Bandit: General Framework and Applications , 2013, ICML.

[19] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[21] Laurence A. Wolsey,et al. Best Algorithms for Approximating the Maximum of a Submodular Set Function , 1978, Math. Oper. Res..