Adversarial Multimodal Representation Learning for Click-Through Rate Prediction

For better user experience and business effectiveness, Click-Through Rate (CTR) prediction has been one of the most important tasks in E-commerce. Although extensive CTR prediction models have been proposed, learning good representation of items from multimodal features is still less investigated, considering an item in E-commerce usually contains multiple heterogeneous modalities. Previous works either concatenate the multiple modality features, that is equivalent to giving a fixed importance weight to each modality; or learn dynamic weights of different modalities for different items through technique like attention mechanism. However, a problem is that there usually exists common redundant information across multiple modalities. The dynamic weights of different modalities computed by using the redundant information may not correctly reflect the different importance of each modality. To address this, we explore the complementarity and redundancy of modalities by considering modality-specific and modality-invariant features differently. We propose a novel Multimodal Adversarial Representation Network (MARN) for the CTR prediction task. A multimodal attention network first calculates the weights of multiple modalities for each item according to its modality-specific features. Then a multimodal adversarial network learns modality-invariant representations where a double-discriminators strategy is introduced. Finally, we achieve the multimodal item representations by combining both modality-specific and modality-invariant representations. We conduct extensive experiments on both public and industrial datasets, and the proposed method consistently achieves remarkable improvements to the state-of-the-art methods. Moreover, the approach has been deployed in an operational E-commerce system and online A/B testing further demonstrates the effectiveness.

[1]  Xing Xie,et al.  Neural News Recommendation with Attentive Multi-View Learning , 2019, IJCAI.

[2]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[3]  Yiqiang Chen,et al.  Transfer Learning with Dynamic Adversarial Adaptation Network , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[4]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Xiaogang Wang,et al.  Multi-source Deep Learning for Human Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Gang Chen,et al.  Personal recommendation using deep recurrent neural networks in NetEase , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[7]  Hao Zhou,et al.  AutoCross: Automatic Feature Crossing for Tabular Data in Real-World Applications , 2019, KDD.

[8]  Paul Covington,et al.  Deep Neural Networks for YouTube Recommendations , 2016, RecSys.

[9]  Ho-Jin Choi,et al.  Personalized news recommendation using classified keywords to capture user preference , 2014, 16th International Conference on Advanced Communication Technology.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[13]  Louis-Philippe Morency,et al.  Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Guorui Zhou,et al.  Deep Interest Network for Click-Through Rate Prediction , 2017, KDD.

[15]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[16]  Wu-Jun Li,et al.  Coupled Group Lasso for Web-Scale CTR Prediction in Display Advertising , 2014, ICML.

[17]  Yang Yang,et al.  Adversarial Cross-Modal Retrieval , 2017, ACM Multimedia.

[18]  Xiaodong He,et al.  A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.

[19]  Haoran Li,et al.  Multi-modal Sentence Summarization with Modality Attention and Image Filtering , 2018, IJCAI.

[20]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[22]  Feiping Nie,et al.  Dense Multimodal Fusion for Hierarchically Joint Representation , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Liang Zhang,et al.  LiJAR: A System for Job Application Redistribution towards Efficient Career Marketplace , 2017, KDD.

[24]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[25]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[26]  Chang Zhou,et al.  Deep Interest Evolution Network for Click-Through Rate Prediction , 2018, AAAI.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Chuan Chen,et al.  Learning Semantic Representations for Unsupervised Domain Adaptation , 2018, ICML.

[29]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[30]  Jun Wang,et al.  Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification , 2014, ACM Multimedia.

[31]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[32]  Xiang Li,et al.  Perceive Your Users in Depth: Learning Universal User Representations from Multiple E-commerce Tasks , 2018, KDD.

[33]  Yong Liu,et al.  Improved Recurrent Neural Networks for Session-based Recommendations , 2016, DLRS@RecSys.

[34]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[35]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[37]  Xing Xie,et al.  xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, KDD.

[38]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.