Disentangled Self-Attentive Neural Networks for Click-Through Rate Prediction

Click-Through Rate (CTR) prediction, whose aim is to predict the probability of whether a user will click on an item, is an essential task for many online applications. Due to the nature of data sparsity and high dimensionality of CTR prediction, a key to making effective prediction is to model high-order feature interaction. An efficient way to do this is to perform inner product of feature embeddings with self-attentive neural networks. To better model complex feature interaction, in this paper we propose a novel DisentanglEd Self-atTentIve NEtwork (DESTINE) framework for CTR prediction that explicitly decouples the computation of unary feature importance from pairwise interaction. Specifically, the unary term models the general importance of one feature on all other features, whereas the pairwise interaction term contributes to learning the pure impact for each feature pair. We conduct extensive experiments using two real-world benchmark datasets. The results show that DESTINE not only maintains computational efficiency but achieves consistent improvements over state-of-the-art baselines.

[1]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[2]  Gang Fu,et al.  Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  Jian Tang,et al.  AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks , 2018, CIKM.

[5]  Xing Xie,et al.  Session-based Recommendation with Graph Neural Networks , 2018, AAAI.

[6]  Lars Schmidt-Thieme,et al.  Fast context-aware recommendations with factorization machines , 2011, SIGIR.

[7]  Yunming Ye,et al.  DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.

[8]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[9]  Ming Zhang,et al.  AutoInt , 2019, Proceedings of the 28th ACM International Conference on Information and Knowledge Management.

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Quoc V. Le,et al.  Listen, attend and spell: A neural network for large vocabulary conversational speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[13]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, KDD.

[14]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[15]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[16]  Liang Wang,et al.  Fi-GNN: Modeling Feature Interactions via Graph Neural Networks for CTR Prediction , 2019, CIKM.

[17]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[18]  Tat-Seng Chua,et al.  Neural Factorization Machines for Sparse Predictive Analytics , 2017, SIGIR.

[19]  Naonori Ueda,et al.  Higher-Order Factorization Machines , 2016, NIPS.

[20]  Qiang Liu,et al.  Deep Interaction Machine: A Simple but Effective Model for High-order Feature Interactions , 2020, CIKM.

[21]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[22]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[23]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[24]  Lars Schmidt-Thieme,et al.  Factorizing personalized Markov chains for next-basket recommendation , 2010, WWW '10.

[25]  Qiang Liu,et al.  TFNet: Multi-Semantic Feature Interaction for CTR Prediction , 2020, SIGIR.

[26]  Heng-Tze Cheng,et al.  Wide & Deep Learning for Recommender Systems , 2016, DLRS@RecSys.

[27]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[28]  Yidong Chen,et al.  Deep Semantic Role Labeling with Self-Attention , 2017, AAAI.

[29]  Lina Yao,et al.  Holographic Factorization Machines for Recommendation , 2019, AAAI.

[30]  Matthew Richardson,et al.  Predicting clicks: estimating the click-through rate for new ads , 2007, WWW '07.

[31]  Xiaokui Xiao,et al.  Coupled Multi-Layer Attentions for Co-Extraction of Aspect and Opinion Terms , 2017, AAAI.

[32]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[33]  Shuai Yang,et al.  FuxiCTR: An Open Benchmark for Click-Through Rate Prediction , 2020, ArXiv.

[34]  Feng Yu,et al.  A Convolutional Click Prediction Model , 2015, CIKM.

[35]  Zheng Zhang,et al.  Disentangled Non-Local Neural Networks , 2020, ECCV.

[36]  Julian J. McAuley,et al.  Self-Attentive Sequential Recommendation , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[37]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, Knowledge Discovery and Data Mining.

[38]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[39]  Chih-Jen Lin,et al.  Field-aware Factorization Machines for CTR Prediction , 2016, RecSys.

[40]  Dong Yu,et al.  Deep Crossing: Web-Scale Modeling without Manually Crafted Combinatorial Features , 2016, KDD.

[41]  Rohan Ramanath,et al.  An Attentive Survey of Attention Models , 2019, ACM Trans. Intell. Syst. Technol..

[42]  Qiang Liu,et al.  TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation , 2020, SIGIR.

[43]  Yuandong Tian,et al.  Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction , 2020, KDD.

[44]  Qi Tian,et al.  Joint Global and Co-Attentive Representation Learning for Image-Sentence Retrieval , 2018, ACM Multimedia.

[45]  J. Friedman Exploratory Projection Pursuit , 1987 .

[46]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[47]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[48]  Tat-Seng Chua,et al.  Attentional Factorization Machines: Learning the Weight of Feature Interactions via Attention Networks , 2017, IJCAI.

[49]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.