Ask the GRU: Multi-task Learning for Deep Text Recommendations

In a variety of application domains the content to be recommended to users is associated with text. This includes research papers, movies with associated plot summaries, news articles, blog posts, etc. Recommendation approaches based on latent factor models can be extended naturally to leverage text by employing an explicit mapping from text to factors. This enables recommendations for new, unseen content, and may generalize better, since the factors for all items are produced by a compactly-parametrized model. Previous work has used topic models or averages of word embeddings for this mapping. In this paper we present a method leveraging deep recurrent neural networks to encode the text sequence into a latent vector, specifically gated recurrent units (GRUs) trained end-to-end on the collaborative filtering task. For the task of scientific paper recommendation, this yields models with significantly higher accuracy. In cold-start scenarios, we beat the previous state-of-the-art, all of which ignore word order. Performance is further improved by multi-task learning, where the text encoder network is trained for a combination of content recommendation and item metadata prediction. This regularizes the collaborative filtering model, ameliorating the problem of sparsity of the observed rating matrix.

[1]  Benjamin Schrauwen,et al.  Deep content-based music recommendation , 2013, NIPS.

[2]  Aaron C. Courville,et al.  Learning Distributed Representations from Reviews for Collaborative Filtering , 2015, RecSys.

[3]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[4]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[5]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[6]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[7]  Loriene Roy,et al.  Content-based book recommending using learning for text categorization , 1999, DL '00.

[8]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[9]  Yoshua Bengio,et al.  Joint Training of Deep Boltzmann Machines , 2012, ArXiv.

[10]  Martin Ester,et al.  Collaborative Denoising Auto-Encoders for Top-N Recommender Systems , 2016, WSDM.

[11]  Xiaodong He,et al.  A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems , 2015, WWW.

[12]  Deepak Agarwal,et al.  Regression-based latent factor models , 2009, KDD.

[13]  Geoffrey J. Gordon,et al.  Relational learning via collective matrix factorization , 2008, KDD.

[14]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[15]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[16]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[17]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[18]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[19]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[20]  Misha Denil,et al.  Extraction of Salient Sentences from Labelled Documents , 2014, ArXiv.

[21]  Lars Schmidt-Thieme,et al.  Learning Attribute-to-Feature Mappings for Cold-Start Recommendations , 2010, 2010 IEEE International Conference on Data Mining.

[22]  William W. Cohen,et al.  Recommendation as Classification: Using Social and Content-Based Information in Recommendation , 1998, AAAI/IAAI.

[23]  BottouLéon,et al.  Natural Language Processing (Almost) from Scratch , 2011 .

[24]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[25]  Yehuda Koren,et al.  Lessons from the Netflix prize challenge , 2007, SKDD.

[26]  Patrick Seemann,et al.  Matrix Factorization Techniques for Recommender Systems , 2014 .

[27]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[28]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[31]  Thomas Hofmann,et al.  Unifying collaborative and content-based filtering , 2004, ICML.

[32]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[33]  Chiranjib Bhattacharyya,et al.  Content Driven User Profiling for Comment-Worthy Recommendations of News and Blog Articles , 2015, RecSys.

[34]  David M. Blei,et al.  Content-based recommendations with Poisson factorization , 2014, NIPS.

[35]  Ye Wang,et al.  Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.

[36]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[37]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[38]  Steffen Rendle,et al.  Factorization Machines , 2010, 2010 IEEE International Conference on Data Mining.

[39]  Daniel M. Roy,et al.  Neural Network Matrix Factorization , 2015, ArXiv.

[40]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Michael R. Lyu,et al.  Ratings meet reviews, a combined approach to recommend , 2014, RecSys '14.

[43]  Martha Larson,et al.  Collaborative Filtering beyond the User-Item Matrix , 2014, ACM Comput. Surv..

[44]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[45]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[46]  Barry Smyth,et al.  Using twitter to recommend real-time topical news , 2009, RecSys '09.

[47]  David Carmel,et al.  Social media recommendation based on people and tags , 2010, SIGIR.

[48]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[49]  Michael R. Lyu,et al.  SoRec: social recommendation using probabilistic matrix factorization , 2008, CIKM '08.

[50]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[51]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[52]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[53]  Julian J. McAuley,et al.  VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback , 2015, AAAI.

[54]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[55]  Scott Sanner,et al.  AutoRec: Autoencoders Meet Collaborative Filtering , 2015, WWW.

[56]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[57]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[58]  Jason Weston,et al.  #TagSpace: Semantic Embeddings from Hashtags , 2014, EMNLP.

[59]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .