Learning to Interact with Users: A Collaborative-Bandit Approach

Learning to interact with users and discover their preferences is central in most web applications, with recommender systems being a notable example. From such a perspective, merging interactive learning algorithms with recommendation models is natural. While recent literature has explored the idea of combining collaborative filtering approaches with bandit techniques, there exist two limitations: (1) they usually consider Gaussian rewards, which are not suitable for implicit feedback data powering most recommender systems, and (2) they are restricted to the one-item recommendation setting while typically a list of recommendations is given. In this paper, to address these limitations, apart from Gaussian rewards we also consider Bernoulli rewards, the latter being suitable for dyadic data. Also, we consider two user click models: the one-item click/no-click model, and the cascade click model which is suitable for top-K recommendations. For these settings, we propose novel machine learning algorithms that learn to interact with users by learning the underlying parameters collaboratively across users and items. We provide an extensive empirical study, which is the first to illustrate all pairwise empirical comparisons across different interactive learning algorithms for recommendation. Our experiments demonstrate that when the number of users and items is large, propagating the feedback across users and items while learning latent features is the most effective approach for systems to learn to interact with the users.

[1]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[2]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[3]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[4]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[5]  Zheng Wen,et al.  Cascading Bandits for Large-Scale Recommendation Problems , 2016, UAI.

[6]  Larisa Shwartz,et al.  Online Interactive Collaborative Filtering Using Multi-Armed Bandit with Dependent Arms , 2017, IEEE Transactions on Knowledge and Data Engineering.

[7]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[8]  Arindam Banerjee,et al.  On Bayesian bounds , 2006, ICML.

[9]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[10]  Andreas Krause,et al.  Explore-exploit in top-N recommender systems via Gaussian processes , 2014, RecSys '14.

[11]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[12]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[13]  Steven L. Scott,et al.  A modern Bayesian look at the multi-armed bandit , 2010 .

[14]  Filip Radlinski,et al.  Towards Conversational Recommender Systems , 2016, KDD.

[15]  Bee-Chung Chen,et al.  Explore/Exploit Schemes for Web Content Optimization , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[16]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[17]  Arindam Banerjee,et al.  Generalized Probabilistic Matrix Factorizations for Collaborative Filtering , 2010, 2010 IEEE International Conference on Data Mining.

[18]  Christopher C. Johnson Logistic Matrix Factorization for Implicit Feedback Data , 2014 .

[19]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[20]  Jun Wang,et al.  Interactive collaborative filtering , 2013, CIKM.

[21]  Deepak Agarwal,et al.  Content recommendation on web portals , 2013, CACM.

[22]  Daniel Fink A Compendium of Conjugate Priors , 1997 .

[23]  Devavrat Shah,et al.  A Latent Source Model for Online Collaborative Filtering , 2014, NIPS.

[24]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[25]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.

[26]  Hady Wirawan Lauw,et al.  Dynamic Clustering of Contextual Multi-Armed Bandits , 2014, CIKM.

[27]  Long Tran-Thanh,et al.  Efficient Thompson Sampling for Online Matrix-Factorization Recommendation , 2015, NIPS.

[28]  Quanquan Gu,et al.  Contextual Bandits in a Collaborative Environment , 2016, SIGIR.

[29]  Shuai Li,et al.  Distributed Clustering of Linear Bandits in Peer to Peer Networks , 2016, ICML.

[30]  Nick Craswell,et al.  An experimental comparison of click position-bias models , 2008, WSDM '08.

[31]  Claudio Gentile,et al.  A Gang of Bandits , 2013, NIPS.

[32]  Shuai Li,et al.  On Context-Dependent Clustering of Bandits , 2016, ICML.

[33]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[34]  Alexandros G. Dimakis,et al.  Latent Contextual Bandits , 2016 .

[35]  Huazheng Wang,et al.  Factorization Bandits for Interactive Recommendation , 2017, AAAI.

[36]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[37]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .