论文信息 - Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention

Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention

Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions is 1/0 binary implicit feedback (e.g., photo likes, video views, song downloads, etc.), which can be collected at a larger scale with a much lower cost than explicit feedback (e.g., product ratings). However, the majority of existing collaborative filtering (CF) systems are not well-designed for multimedia recommendation, since they ignore the implicitness in users' interactions with multimedia content. We argue that, in multimedia recommendation, there exists item- and component-level implicitness which blurs the underlying users' preferences. The item-level implicitness means that users' preferences on items (e.g. photos, videos, songs, etc.) are unknown, while the component-level implicitness means that inside each item users' preferences on different components (e.g. regions in an image, frames of a video, etc.) are unknown. For example, a 'view'' on a video does not provide any specific information about how the user likes the video (i.e.item-level) and which parts of the video the user is interested in (i.e.component-level). In this paper, we introduce a novel attention mechanism in CF to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF). Specifically, our attention model is a neural network that consists of two attention modules: the component-level attention module, starting from any content feature extraction network (e.g. CNN for images/videos), which learns to select informative components of multimedia items, and the item-level attention module, which learns to score the item preferences. ACF can be seamlessly incorporated into classic CF models with implicit feedback, such as BPR and SVD++, and efficiently trained using SGD. Through extensive experiments on two real-world multimedia Web services: Vine and Pinterest, we show that ACF significantly outperforms state-of-the-art CF methods.

[1] Tat-Seng Chua,et al. Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model , 2016, ACM Multimedia.

[2] Jingyuan Chen,et al. Multi-Modal Learning: Study on A Large-Scale Micro-Video Data Collection , 2016, ACM Multimedia.

[3] Lars Schmidt-Thieme,et al. BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[4] Tat-Seng Chua,et al. Neural Collaborative Filtering , 2017, WWW.

[5] Jiebo Luo,et al. Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Yang Yang,et al. One of a Kind: User Profiling by Social Curation , 2014, ACM Multimedia.

[7] Tao Mei,et al. VideoReach: an online video recommendation system , 2007, SIGIR.

[8] Michael J. Pazzani,et al. Content-Based Recommendation Systems , 2007, The Adaptive Web.

[9] Cristian Sminchisescu,et al. Spatio-Temporal Attention Models for Grounded Video Captioning , 2016, ACCV.

[10] Yehuda Koren,et al. Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[11] John Riedl,et al. Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[12] Tat-Seng Chua,et al. Fast Matrix Factorization for Online Recommendation with Implicit Feedback , 2016, SIGIR.

[13] Huanbo Luan,et al. Discrete Collaborative Filtering , 2016, SIGIR.

[14] Benjamin Schrauwen,et al. Deep content-based music recommendation , 2013, NIPS.

[15] Tat-Seng Chua,et al. Unifying Virtual and Physical Worlds , 2017, ACM Trans. Inf. Syst..

[16] Zhi-Dan Zhao,et al. User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[17] Zhou Su,et al. What Videos Are Similar with You?: Learning a Common Attributed Representation for Video Recommendation , 2014, ACM Multimedia.

[18] Yue Gao,et al. Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[19] Meng Wang,et al. Multimedia recommendation: technology and techniques , 2013, SIGIR.

[20] Tat-Seng Chua,et al. Computational Social Indicators: A Case Study of Chinese University Ranking , 2017, SIGIR.

[21] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[22] Shih-Fu Chang,et al. Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24] Tat-Seng Chua,et al. SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Yong Yu,et al. SVDFeature: a toolkit for feature-based collaborative filtering , 2012, J. Mach. Learn. Res..

[26] Tat-Seng Chua,et al. Item Silk Road: Recommending Items from Information Domains to Social Users , 2017, SIGIR.

[27] Meng Wang,et al. Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[28] Huan Liu,et al. What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation , 2017, WWW.

[29] Tao Chen,et al. Context-aware Image Tweet Modelling and Recommendation , 2016, ACM Multimedia.

[30] Ming Gao,et al. BiRank: Towards Ranking on Bipartite Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[31] Tat-Seng Chua,et al. Cross-Domain Recommendation via Clustering on Multi-Layer Graphs , 2017, SIGIR.

[32] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33] Tat-Seng Chua,et al. Shorter-is-Better: Venue Category Estimation from Micro-Video , 2016, ACM Multimedia.

[34] Yongfeng Zhang,et al. Personalized Key Frame Recommendation , 2017, SIGIR.

[35] Jialie Shen,et al. On Effective Location-Aware Music Recommendation , 2016, ACM Trans. Inf. Syst..

[36] Yifan Hu,et al. Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[37] Qiang Yang,et al. One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[38] Vanja Josifovski,et al. Up next: retrieval methods for large scale related video suggestion , 2014, KDD.

[39] George Karypis,et al. FISM: factored item similarity models for top-N recommender systems , 2013, KDD.

[40] Yoshua Bengio,et al. Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[41] Meng Wang,et al. Visual Classification by ℓ1-Hypergraph Modeling , 2015, IEEE Trans. Knowl. Data Eng..

[42] Steffen Rendle,et al. Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[43] Trevor Darrell,et al. Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Tat-Seng Chua,et al. Learning Image and User Features for Recommendation in Social Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45] Tao Mei,et al. Personalized video recommendation through tripartite graph propagation , 2012, ACM Multimedia.

[46] Shankar Kumar,et al. Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.