Recommendation System-based Upper Confidence Bound for Online Advertising

In this paper, the method UCB-RS, which resorts to recommendation system (RS) for enhancing the upper-confidence bound algorithm UCB, is presented. The proposed method is used for dealing with non-stationary and large-state spaces multi-armed bandit problems. The proposed method has been targeted to the problem of the product recommendation in the online advertising. Through extensive testing with RecoGym, an OpenAI Gym-based reinforcement learning environment for the product recommendation in online advertising, the proposed method outperforms the widespread reinforcement learning schemes such as $\epsilon$-Greedy, Upper Confidence (UCB1) and Exponential Weights for Exploration and Exploitation (EXP3).

[1]  Guy Bresler,et al.  Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering , 2017, 2018 Information Theory and Applications Workshop (ITA).

[2]  Alexandros Karatzoglou,et al.  RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising , 2018, ArXiv.

[3]  Michèle Sebag,et al.  Multi-armed Bandit, Dynamic Environments and Meta-Bandits , 2006 .

[4]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[5]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[6]  Paul Resnick,et al.  Recommender systems , 1997, CACM.

[7]  Michel Tokic Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .

[8]  Michel Tokic,et al.  Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference , 2010, KI.

[9]  Philippe Preux,et al.  Bandits and Recommender Systems , 2015, MOD.

[10]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[11]  E. Bigdeli,et al.  Comparing accuracy of cosine-based similarity and correlation-based similarity algorithms in tourism recommender systems , 2008, 2008 4th IEEE International Conference on Management of Innovation and Technology.

[12]  Sophie Ahrens,et al.  Recommender Systems , 2012 .

[13]  Pushmeet Kohli,et al.  A Fast Bandit Algorithm for Recommendation to Users With Heterogenous Tastes , 2013, AAAI.

[14]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.