Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention

Multimedia content is dominating today's Web information. The nature of multimedia user-item interactions is 1/0 binary implicit feedback (e.g., photo likes, video views, song downloads, etc.), which can be collected at a larger scale with a much lower cost than explicit feedback (e.g., product ratings). However, the majority of existing collaborative filtering (CF) systems are not well-designed for multimedia recommendation, since they ignore the implicitness in users' interactions with multimedia content. We argue that, in multimedia recommendation, there exists item- and component-level implicitness which blurs the underlying users' preferences. The item-level implicitness means that users' preferences on items (e.g. photos, videos, songs, etc.) are unknown, while the component-level implicitness means that inside each item users' preferences on different components (e.g. regions in an image, frames of a video, etc.) are unknown. For example, a 'view'' on a video does not provide any specific information about how the user likes the video (i.e.item-level) and which parts of the video the user is interested in (i.e.component-level). In this paper, we introduce a novel attention mechanism in CF to address the challenging item- and component-level implicit feedback in multimedia recommendation, dubbed Attentive Collaborative Filtering (ACF). Specifically, our attention model is a neural network that consists of two attention modules: the component-level attention module, starting from any content feature extraction network (e.g. CNN for images/videos), which learns to select informative components of multimedia items, and the item-level attention module, which learns to score the item preferences. ACF can be seamlessly incorporated into classic CF models with implicit feedback, such as BPR and SVD++, and efficiently trained using SGD. Through extensive experiments on two real-world multimedia Web services: Vine and Pinterest, we show that ACF significantly outperforms state-of-the-art CF methods.

[1]  Tat-Seng Chua,et al.  Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model , 2016, ACM Multimedia.

[2]  Jingyuan Chen,et al.  Multi-Modal Learning: Study on A Large-Scale Micro-Video Data Collection , 2016, ACM Multimedia.

[3]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[4]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[5]  Jiebo Luo,et al.  Image Captioning with Semantic Attention , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yang Yang,et al.  One of a Kind: User Profiling by Social Curation , 2014, ACM Multimedia.

[7]  Tao Mei,et al.  VideoReach: an online video recommendation system , 2007, SIGIR.

[8]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[9]  Cristian Sminchisescu,et al.  Spatio-Temporal Attention Models for Grounded Video Captioning , 2016, ACCV.

[10]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[11]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[12]  Tat-Seng Chua,et al.  Fast Matrix Factorization for Online Recommendation with Implicit Feedback , 2016, SIGIR.

[13]  Huanbo Luan,et al.  Discrete Collaborative Filtering , 2016, SIGIR.

[14]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[15]  Tat-Seng Chua,et al.  Unifying Virtual and Physical Worlds , 2017, ACM Trans. Inf. Syst..

[16]  Zhi-Dan Zhao,et al.  User-Based Collaborative-Filtering Recommendation Algorithms on Hadoop , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[17]  Zhou Su,et al.  What Videos Are Similar with You?: Learning a Common Attributed Representation for Video Recommendation , 2014, ACM Multimedia.

[18]  Yue Gao,et al.  Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[19]  Meng Wang,et al.  Multimedia recommendation: technology and techniques , 2013, SIGIR.

[20]  Tat-Seng Chua,et al.  Computational Social Indicators: A Case Study of Chinese University Ranking , 2017, SIGIR.

[21]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[22]  Shih-Fu Chang,et al.  Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yong Yu,et al.  SVDFeature: a toolkit for feature-based collaborative filtering , 2012, J. Mach. Learn. Res..

[26]  Tat-Seng Chua,et al.  Item Silk Road: Recommending Items from Information Domains to Social Users , 2017, SIGIR.

[27]  Meng Wang,et al.  Multimodal Graph-Based Reranking for Web Image Search , 2012, IEEE Transactions on Image Processing.

[28]  Huan Liu,et al.  What Your Images Reveal: Exploiting Visual Contents for Point-of-Interest Recommendation , 2017, WWW.

[29]  Tao Chen,et al.  Context-aware Image Tweet Modelling and Recommendation , 2016, ACM Multimedia.

[30]  Ming Gao,et al.  BiRank: Towards Ranking on Bipartite Graphs , 2017, IEEE Transactions on Knowledge and Data Engineering.

[31]  Tat-Seng Chua,et al.  Cross-Domain Recommendation via Clustering on Multi-Layer Graphs , 2017, SIGIR.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Tat-Seng Chua,et al.  Shorter-is-Better: Venue Category Estimation from Micro-Video , 2016, ACM Multimedia.

[34]  Yongfeng Zhang,et al.  Personalized Key Frame Recommendation , 2017, SIGIR.

[35]  Jialie Shen,et al.  On Effective Location-Aware Music Recommendation , 2016, ACM Trans. Inf. Syst..

[36]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[37]  Qiang Yang,et al.  One-Class Collaborative Filtering , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[38]  Vanja Josifovski,et al.  Up next: retrieval methods for large scale related video suggestion , 2014, KDD.

[39]  George Karypis,et al.  FISM: factored item similarity models for top-N recommender systems , 2013, KDD.

[40]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[41]  Meng Wang,et al.  Visual Classification by ℓ1-Hypergraph Modeling , 2015, IEEE Trans. Knowl. Data Eng..

[42]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[43]  Trevor Darrell,et al.  Modeling Relationships in Referential Expressions with Compositional Modular Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Tat-Seng Chua,et al.  Learning Image and User Features for Recommendation in Social Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Tao Mei,et al.  Personalized video recommendation through tripartite graph propagation , 2012, ACM Multimedia.

[46]  Shankar Kumar,et al.  Video suggestion and discovery for youtube: taking random walks through the view graph , 2008, WWW.