论文信息 - Recommendation System-based Upper Confidence Bound for Online Advertising

Recommendation System-based Upper Confidence Bound for Online Advertising

In this paper, the method UCB-RS, which resorts to recommendation system (RS) for enhancing the upper-confidence bound algorithm UCB, is presented. The proposed method is used for dealing with non-stationary and large-state spaces multi-armed bandit problems. The proposed method has been targeted to the problem of the product recommendation in the online advertising. Through extensive testing with RecoGym, an OpenAI Gym-based reinforcement learning environment for the product recommendation in online advertising, the proposed method outperforms the widespread reinforcement learning schemes such as $\epsilon$-Greedy, Upper Confidence (UCB1) and Exponential Weights for Exploration and Exploitation (EXP3).

[1] Guy Bresler,et al. Regret Bounds and Regimes of Optimality for User-User and Item-Item Collaborative Filtering , 2017, 2018 Information Theory and Applications Workshop (ITA).

[2] Alexandros Karatzoglou,et al. RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising , 2018, ArXiv.

[3] Michèle Sebag,et al. Multi-armed Bandit, Dynamic Environments and Meta-Bandits , 2006 .

[4] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[5] Aurélien Garivier,et al. On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008, 0805.3415.

[6] Paul Resnick,et al. Recommender systems , 1997, CACM.

[7] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .

[8] Michel Tokic,et al. Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference , 2010, KI.

[9] Philippe Preux,et al. Bandits and Recommender Systems , 2015, MOD.

[10] Wojciech Zaremba,et al. OpenAI Gym , 2016, ArXiv.

[11] E. Bigdeli,et al. Comparing accuracy of cosine-based similarity and correlation-based similarity algorithms in tourism recommender systems , 2008, 2008 4th IEEE International Conference on Management of Innovation and Technology.

[12] Sophie Ahrens,et al. Recommender Systems , 2012 .

[13] Pushmeet Kohli,et al. A Fast Bandit Algorithm for Recommendation to Users With Heterogenous Tastes , 2013, AAAI.

[14] Din J. Wasem,et al. Mining of Massive Datasets , 2014 .

[15] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.