Investigating the Value of Subtitles for Improved Movie Recommendations

Collaborative filtering (CF) is a highly effective recommendation approach based on preference patterns observed in user-item interaction data. Since pure collaborative methods can have certain limitations, e.g., when the data is sparse, hybrid approaches are a common solution, as they are able to combine collaborative information with side-information (SI) about the items. In this work, we explore the value of subtitle information for the problem of movie recommendation. Differently from previously explored types of movie SI, e.g., titles or synopsis, subtitles are not only longer, but also contain unique information that may help us to predict more accurately if a user will enjoy a movie. To assess the usefulness of subtitles, we propose a technical framework named SubtitleCF that combines user and item embeddings derived from interaction data and SI. The subtitles may be embedded in different ways, e.g., Latent Dirichlet Allocation (LDA) and neural techniques. Computational experiments with a framework instantiation that relies on Bayesian Personalized Ranking (BPR) as industry-strength method for item ranking and different text embedding methods demonstrate the value of subtitles in terms of prediction accuracy and coverage. Moreover, a user study (N=247) reveals that the information contained in subtitles can be leveraged to improve the decision-making processes of users.

[1]  Walid Krichene,et al.  Neural Collaborative Filtering vs. Matrix Factorization Revisited , 2020, RecSys.

[2]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[3]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[4]  Dietmar Jannach,et al.  A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research , 2019, ACM Trans. Inf. Syst..

[5]  Teven Le Scao,et al.  Transformers: State-of-the-Art Natural Language Processing , 2019, EMNLP.

[6]  Guy Uziel,et al.  A generative model for review-based recommendations , 2019, RecSys.

[7]  Qing Guo,et al.  Research Commentary on Recommendations with Side Information: A Survey and Research Directions , 2019, Electron. Commer. Res. Appl..

[8]  Yoav Goldberg,et al.  Understanding Convolutional Neural Networks for Text Classification , 2018, BlackboxNLP@EMNLP.

[9]  Jun Zhao,et al.  LHR: Using LDA Helps Ranking , 2018, Advances in Intelligent Systems and Computing.

[10]  Matthew D. Hoffman,et al.  Variational Autoencoders for Collaborative Filtering , 2018, WWW.

[11]  Wei Niu,et al.  Neural Personalized Ranking for Image Recommendation , 2018, WSDM.

[12]  Konstantinos Bougiatiotis,et al.  Enhanced movie content similarity based on textual, auditory and visual information , 2017, Expert Syst. Appl..

[13]  Dietmar Jannach,et al.  A systematic review and taxonomy of explanations in decision support and recommender systems , 2017, User Modeling and User-Adapted Interaction.

[14]  Arthur Spirling,et al.  Text Preprocessing For Unsupervised Learning: Why It Matters, When It Misleads, And What To Do About It , 2017, Political Analysis.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Deborah Estrin,et al.  Collaborative Metric Learning , 2017, WWW.

[17]  Lei Yu,et al.  A Hybrid Collaborative Filtering Model with Deep Structure for Recommender Systems , 2017, AAAI.

[18]  Ruxandra Tapu,et al.  Video retrieval using relevant topics extraction from movie subtitles , 2016, 2016 12th IEEE International Symposium on Electronics and Telecommunications (ISETC).

[19]  Elena Smirnova,et al.  Meta-Prod2Vec: Product Embeddings Using Side-Information for Recommendation , 2016, RecSys.

[20]  Theodoros Giannakopoulos,et al.  Content Representation and Similarity of Movies based on Topic Extraction from Subtitles , 2016, SETN.

[21]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[22]  Dietmar Jannach,et al.  What recommenders recommend: an analysis of recommendation biases and possible countermeasures , 2015, User Modeling and User-Adapted Interaction.

[23]  Florian Ludwig,et al.  FROY: exploring sentiment-based movie recommendations , 2015, MUM.

[24]  Amihood Amir,et al.  Data Quality Matters in Recommender Systems , 2015, RecSys.

[25]  Nemanja Djuric,et al.  E-commerce in Your Inbox: Product Recommendations at Scale , 2015, KDD.

[26]  Li Chen,et al.  Recommender systems based on user reviews: the state of the art , 2015, User Modeling and User-Adapted Interaction.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Dit-Yan Yeung,et al.  Collaborative Deep Learning for Recommender Systems , 2014, KDD.

[29]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[30]  Sonia Bergamaschi,et al.  Comparing LDA and LSA Topic Models for Content-Based Movie Recommendation Systems , 2014, WEBIST.

[31]  Mouzhi Ge,et al.  How should I explain? A comparison of different explanation types for recommender systems , 2014, Int. J. Hum. Comput. Stud..

[32]  Razvan Pascanu,et al.  Learned-Norm Pooling for Deep Feedforward and Recurrent Neural Networks , 2013, ECML/PKDD.

[33]  Yizhou Sun,et al.  Recommendation in heterogeneous information networks with implicit user feedback , 2013, RecSys.

[34]  J. Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[35]  John Riedl,et al.  The Tag Genome: Encoding Community Knowledge to Support Novel Interaction , 2012, TIIS.

[36]  Jim McCambridge,et al.  The Effects of Demand Characteristics on Research Participant Behaviours in Non-Laboratory Settings: A Systematic Review , 2012, PloS one.

[37]  Li Chen,et al.  A user-centric evaluation framework for recommender systems , 2011, RecSys '11.

[38]  Yi-Cheng Zhang,et al.  Tag-Aware Recommender Systems: A State-of-the-Art Survey , 2011, Journal of Computer Science and Technology.

[39]  Jochen Nessel,et al.  The MovieOracle - Content Based Movie Recommendations , 2011, 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[40]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[41]  Guy Shani,et al.  A Survey of Accuracy Evaluation Metrics of Recommendation Tasks , 2009, J. Mach. Learn. Res..

[42]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[43]  John Riedl,et al.  Tagsplanations: explaining recommendations using tags , 2009, IUI.

[44]  Chung-Kon Shi,et al.  Exploring Movie Recommendation System Using Cultural Metadata , 2008, Trans. Edutainment.

[45]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[46]  John Riedl,et al.  Explaining collaborative filtering recommendations , 2000, CSCW '00.

[47]  Yoav Shoham,et al.  Fab: content-based, collaborative recommendation , 1997, CACM.

[48]  K. Esbensen,et al.  Principal component analysis , 1987 .

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Pasquale Lops,et al.  Linked open data-based explanations for transparent recommender systems , 2019, Int. J. Hum. Comput. Stud..

[51]  Judith Masthoff,et al.  Explaining Recommendations: Design and Evaluation , 2015, Recommender Systems Handbook.

[52]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[53]  Judith Masthoff,et al.  Designing and Evaluating Explanations for Recommender Systems , 2011, Recommender Systems Handbook.

[54]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2009 .

[55]  Steven Bird NLTK: The Natural Language Toolkit , 2006, ACL.

[56]  Raymond J. Mooney,et al.  Explaining Recommendations: Satisfaction vs. Promotion , 2005 .

[57]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.