Exploring Content-based Video Relevance for Video Click-Through Rate Prediction

This paper describes our solution for the Hulu Challenge. To answer the challenge, we introduce two content-based models, namely, Cascading Mapping Network (CMN) and Relevant-Enhanced Deep Interest Network (REDIN). CMN predicts video Click-Through Rate (CTR) by predicting content-based video relevance. REDIN mainly improves the popular Deep Interest Network by adding explicit video relevance constraint, which provides guidance for low-level video feature learning thus helpful for CTR prediction. Based on the two models, our solution obtains Area Under Curve (AUC) score of 0.6022 and 0.6155 on the TV-shows and Movie track respectively. What is more, we are one of the only two teams giving scores of over 0.6 on both tracks. The results justify the effectiveness and stability of our proposed solution.

[1]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[2]  Xun Wang,et al.  Feature Re-Learning with Data Augmentation for Content-based Video Recommendation , 2018, ACM Multimedia.

[3]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[4]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[5]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Dong Liu,et al.  Content-Based Video Relevance Prediction with Second-Order Relevance and Attention Modeling , 2018, ACM Multimedia.

[7]  Yann LeCun,et al.  A Closer Look at Spatiotemporal Convolutions for Action Recognition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Xirong Li,et al.  Predicting Visual Features From Text for Image and Video Caption Retrieval , 2017, IEEE Transactions on Multimedia.

[9]  Chang Zhou,et al.  Deep Interest Evolution Network for Click-Through Rate Prediction , 2018, AAAI.

[10]  Xiaohui Xie,et al.  Content-based Video Relevance Prediction Challenge: Data, Protocol, and Baseline , 2018, ArXiv.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[13]  Tat-Seng Chua,et al.  Neural Factorization Machines for Sparse Predictive Analytics , 2017, SIGIR.

[14]  Xirong Li,et al.  TagBook: A Semantic Video Representation Without Supervision for Event Detection , 2015, IEEE Transactions on Multimedia.

[15]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[16]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[17]  Xirong Li,et al.  Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xirong Li,et al.  Early Embedding and Late Reranking for Video Captioning , 2016, ACM Multimedia.

[19]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).