A Joint Dynamic Ranking System with DNN and Vector-based Clustering Bandit

The ad-ranking module is the core of the advertising recommender system. Existing ad-ranking modules are mainly based on the deep neural network click-through rate prediction model. Recently an innovative ad-ranking paradigm called DNN-MAB has been introduced to address DNN-only paradigms’ weakness in perceiving highly dynamic user intent over time. We introduce the DNN-MAB paradigm into our ad-ranking system to alleviate the Matthew effect that harms the user experience. Due to data sparsity, however, the actual performance of DNN-MAB is lower than expected. In this paper, we propose an innovative ad-ranking paradigm called DNN-VMAB to solve these problems. Based on vectorization and clustering, it utilizes latent collaborative information in user behavior data to find a set of ads with higher relativity and diversity. As an integration of the essences of classical collaborative filtering, deep click-through rate prediction model, and contextual multi-armed bandit, it can improve platform revenue and user experience. Both offline and online experiments show the advantage of our new algorithm over DNN-MAB and some other existing algorithms.

[1]  Devavrat Shah,et al.  A Latent Source Model for Online Collaborative Filtering , 2014, NIPS.

[2]  Shuai Li,et al.  On Context-Dependent Clustering of Bandits , 2016, ICML.

[3]  Pablo Castells,et al.  A simple multi-armed nearest-neighbor bandit for interactive recommendation , 2019, RecSys.

[4]  Shuai Li,et al.  Graph Clustering Bandits for Recommendation , 2016, ArXiv.

[5]  Peter Auer,et al.  Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..

[6]  Shuai Li,et al.  Collaborative Filtering Bandits , 2015, SIGIR.

[7]  Zheng Wen,et al.  Cascading Bandits: Learning to Rank in the Cascade Model , 2015, ICML.

[8]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Filip Radlinski,et al.  Ranked bandits in metric spaces: learning diverse rankings over large document collections , 2013, J. Mach. Learn. Res..

[11]  Meng Zhao,et al.  A Practical Deep Online Ranking System in E-commerce Recommendation , 2018, ECML/PKDD.

[12]  M. de Rijke,et al.  BubbleRank: Safe Online Learning to Re-Rank via Implicit Click Feedback , 2018, UAI.

[13]  Shuai Li,et al.  Online Clustering of Bandits , 2014, ICML.

[14]  S.H.G. ten Hagen,et al.  Exploration/exploitation in adaptive recommender systems , 2003 .

[15]  Xu Zhang,et al.  Real-time Attention Based Look-alike Model for Recommender System , 2019, KDD.

[16]  Keping Yang,et al.  Deep Session Interest Network for Click-Through Rate Prediction , 2019, IJCAI.

[17]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[18]  Hady Wirawan Lauw,et al.  Dynamic Clustering of Contextual Multi-Armed Bandits , 2014, CIKM.