DEAR: Deep Reinforcement Learning for Online Advertising Impression in Recommender Systems

With the recent prevalence of Reinforcement Learning (RL), there have been tremendous interests in utilizing RL for online advertising in recommendation platforms (e.g., ecommerce and news feed sites). However, most RL-based advertising algorithms focus on optimizing ads’ revenue while ignoring the possible negative influence of ads on user experience of recommended items (products, articles and videos). Developing an optimal advertising algorithm in recommendations faces immense challenges because interpolating ads improperly or too frequently may decrease user experience, while interpolating fewer ads will reduce the advertising revenue. Thus, in this paper, we propose a novel advertising strategy for the rec/ads trade-off. To be specific, we develop an RL-based framework that can continuously update its advertising strategies and maximize reward in the long run. Given a recommendation list, we design a novel Deep Qnetwork architecture that can determine three internally related tasks jointly, i.e., (i) whether to interpolate an ad or not in the recommendation list, and if yes, (ii) the optimal ad and (iii) the optimal location to interpolate. The experimental results based on real-world data demonstrate the effectiveness of the proposed framework.

[1]  Zheng Wang,et al.  Efficient Delivery Policy to Minimize User Traffic Consumption in Guaranteed Advertising , 2016, AAAI.

[2]  Peter S. Fader,et al.  Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments , 2016, Mark. Sci..

[3]  Marcello Restelli,et al.  A Combinatorial-Bandit Algorithm for the Online Joint Bid/Budget Optimization of Pay-per-Click Advertising Campaigns , 2018, AAAI.

[4]  Marcello Restelli,et al.  Targeting Optimization for Internet Advertising by Learning from Logged Bandit Feedback , 2018, 2018 International Joint Conference on Neural Networks (IJCNN).

[5]  Jiliang Tang,et al.  Toward Simulating Environments in Reinforcement Learning Based Recommendations , 2019, ArXiv.

[6]  Alexandros Karatzoglou,et al.  RecoGym: A Reinforcement Learning Environment for the problem of Product Recommendation in Online Advertising , 2018, ArXiv.

[7]  Ed H. Chi,et al.  Top-K Off-Policy Correction for a REINFORCE Recommender System , 2018, WSDM.

[8]  Yiming Yang,et al.  A unified optimization framework for auction and guaranteed delivery in online advertising , 2012, CIKM.

[9]  Jian Xu,et al.  Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning , 2018, ArXiv.

[10]  Jun Wang,et al.  Real-Time Bidding by Reinforcement Learning in Display Advertising , 2017, WSDM.

[11]  Di Wu,et al.  A Multi-Agent Reinforcement Learning Method for Impression Allocation in Online Display Advertising , 2018, ArXiv.

[12]  Jiliang Tang,et al.  Jointly Learning to Recommend and Advertise , 2020, KDD.

[13]  Yulong Gu,et al.  Neural Interactive Collaborative Filtering , 2020, SIGIR.

[14]  Liang Zhang,et al.  Deep Reinforcement Learning for List-wise Recommendations , 2017, ArXiv.

[15]  Xiaoyan Zhu,et al.  Learning to Collaborate: Multi-Scenario Ranking via Multi-Agent Reinforcement Learning , 2018, WWW.

[16]  Joshua B. Tenenbaum,et al.  Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation , 2016, NIPS.

[17]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[18]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[19]  Liang Zhang,et al.  Recommendations with Negative Feedback via Pairwise Deep Reinforcement Learning , 2018, KDD.

[20]  Weinan Zhang,et al.  Real-Time Bidding with Multi-Agent Reinforcement Learning in Display Advertising , 2018, CIKM.

[21]  Qing Li,et al.  Attacking Black-box Recommendations via Copying Cross-domain User Profiles , 2020, ArXiv.

[22]  Nicholas Jing Yuan,et al.  DRN: A Deep Reinforcement Learning Framework for News Recommendation , 2018, WWW.

[23]  J. Ioannidis The Proposal to Lower P Value Thresholds to .005. , 2018, JAMA.

[24]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[25]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[26]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[27]  Hongxia Yang,et al.  Dynamic Contextual Multi Arm Bandits in Display Advertisement , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[28]  Liang Tang,et al.  Automatic ad format selection via contextual bandits , 2013, CIKM.

[29]  Jiliang Tang,et al.  Model-Based Reinforcement Learning for Whole-Chain Recommendations , 2019, ArXiv.

[30]  Grace Hui Yang,et al.  Deep Reinforcement Learning for Information Retrieval: Fundamentals and Advances , 2020, SIGIR.

[31]  Liang Zhang,et al.  Deep reinforcement learning for page-wise recommendations , 2018, RecSys.

[32]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[33]  Hao Wang,et al.  Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising , 2018, CIKM.

[34]  Tao Qin,et al.  Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising , 2013, NIPS.

[35]  Jiliang Tang,et al.  Automated Embedding Size Search in Deep Recommender Systems , 2020, SIGIR.

[36]  Jiliang Tang,et al.  "Deep reinforcement learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator , 2018, SIGWEB Newsl..

[37]  Yongfeng Zhang,et al.  Towards Long-term Fairness in Recommendation , 2021, WSDM.

[38]  Jung-Woo Ha,et al.  Reinforcement Learning based Recommender System using Biclustering Technique , 2018, ArXiv.

[39]  P. Schrimpf,et al.  Dynamic Programming , 2011 .

[40]  Jun Wang,et al.  Adaptive Keywords Extraction with Contextual Bandits for Advertising on Parked Domains , 2013, SIGIR 2013.

[41]  Xing Xie,et al.  A Reinforcement Learning Framework for Explainable Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[42]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[43]  Martha White,et al.  Linear Off-Policy Actor-Critic , 2012, ICML.

[44]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[45]  Yong Yu,et al.  Large-scale Interactive Recommendation with Tree-structured Policy Gradient , 2018, AAAI.