R-UCB: a Contextual Bandit Algorithm for Risk-Aware Recommender Systems

Mobile Context-Aware Recommender Systems can be naturally modelled as an exploration/exploitation trade-off (exr/exp) problem, where the system has to choose between maximizing its expected rewards dealing with its current knowledge (exploitation) and learning more about the unknown user's preferences to improve its knowledge (exploration). This problem has been addressed by the reinforcement learning community but they do not consider the risk level of the current user's situation, where it may be dangerous to recommend items the user may not desire in her current situation if the risk level is high. We introduce in this paper an algorithm named R-UCB that considers the risk level of the user's situation to adaptively balance between exr and exp. The detailed analysis of the experimental results reveals several important discoveries in the exr/exp behaviour.

[1]  Bernd Ludwig,et al.  Context relevance assessment and exploitation in mobile recommender systems , 2012, Personal and Ubiquitous Computing.

[2]  Frank Sehnke,et al.  Policy Gradients with Parameter-Based Exploration for Control , 2008, ICANN.

[3]  Jesse Hoey,et al.  An analytic solution to discrete Bayesian reinforcement learning , 2006, ICML.

[4]  Michael Roberts,et al.  Activity-based serendipitous recommendations with the Magitti mobile leisure guide , 2008, CHI.

[5]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[6]  Bee-Chung Chen,et al.  Explore/Exploit Schemes for Web Content Optimization , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  Günther Palm,et al.  Robust Exploration/Exploitation Trade-Offs in Safety-Critical Applications , 2012 .

[8]  Alda Lopes Gançarski,et al.  Risk-Aware Recommender Systems , 2013, ICONIP.

[9]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[10]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[11]  Djallel Bouneffouf,et al.  DRARS, A Dynamic Risk-Aware Recommender System , 2013 .

[12]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[13]  Dunja Mladenic,et al.  Text-learning and related intelligent agents: a survey , 1999, IEEE Intell. Syst..

[14]  Jin Cao,et al.  Mobile User Profile Acquisition through Network Observables and Explicit User Queries , 2008, The Ninth International Conference on Mobile Data Management (mdm 2008).

[15]  Alda Lopes Gançarski,et al.  A Contextual-Bandit Algorithm for Mobile Context-Aware Recommender System , 2012, ICONIP.

[16]  Fritz Wysotzki,et al.  Risk-Sensitive Reinforcement Learning Applied to Control under Constraints , 2005, J. Artif. Intell. Res..

[17]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[18]  Wei Li,et al.  Exploitation and exploration in a performance based contextual advertising system , 2010, KDD.

[19]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[20]  Mehryar Mohri,et al.  Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.