Conversational Contextual Bandit: Algorithm and Application

Contextual bandit algorithms provide principled online learning solutions to balance the exploitation-exploration trade-off in various applications such as recommender systems. However, the learning speed of the traditional contextual bandit algorithms is often slow due to the need for extensive exploration. This poses a critical issue in applications like recommender systems, since users may need to provide feedbacks on a lot of uninterested items. To accelerate the learning speed, we generalize contextual bandit to conversational contextual bandit. Conversational contextual bandit leverages not only behavioral feedbacks on arms (e.g., articles in news recommendation), but also occasional conversational feedbacks on key-terms from the user. Here, a key-term can relate to a subset of arms, for example, a category of articles in news recommendation. We then design the Conversational UCB algorithm (ConUCB) to address two challenges in conversational contextual bandit: (1) which key-terms to select to conduct conversation, (2) how to leverage conversational feedbacks to accelerate the speed of bandit learning. We theoretically prove that ConUCB can achieve a smaller regret upper bound than the traditional contextual bandit algorithm LinUCB, which implies a faster learning speed. Experiments on synthetic data, as well as real datasets from Yelp and Toutiao, demonstrate the efficacy of the ConUCB algorithm.

[1]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[2]  Qing Wang,et al.  Online Context-Aware Recommendation with Time Varying Multi-Armed Bandit , 2016, KDD.

[3]  Quanquan Gu,et al.  Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[4]  Huazheng Wang,et al.  Learning Hidden Features for Contextual Bandits , 2016, CIKM.

[5]  Csaba Szepesvári,et al.  Improved Algorithms for Linear Stochastic Bandits , 2011, NIPS.

[6]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[7]  Aurélien Garivier,et al.  Parametric Bandits: The Generalized Linear Case , 2010, NIPS.

[8]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[9]  Hongxia Jin,et al.  A Visual Dialog Augmented Interactive Recommender System , 2019, KDD.

[10]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[11]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[12]  Konstantina Christakopoulou,et al.  Q&R: A Two-Stage Approach toward Interactive Recommendation , 2018, KDD.

[13]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[14]  Yuheng Bu,et al.  Active Learning in Recommendation Systems with Multi-level User Preferences , 2018, ArXiv.

[15]  Xu Chen,et al.  Towards Conversational Search and Recommendation: System Ask, User Respond , 2018, CIKM.

[16]  Yisong Yue,et al.  Hierarchical Exploration for Accelerating Contextual Bandits , 2012, ICML.