Comparing Topic Models for a Movie Recommendation System

Recommendation systems have become successful at suggesting content that are likely to be of interest to the user, however their performance greatly suffers when little information about the users preferences are given. In this paper we propose an automated movie recommendation system based on the similarity of movie: given a target movie selected by the user, the goal of the system is to provide a list of those movies that are most similar to the target one, without knowing any user preferences. The Topic Models of Latent Semantic Allocation (LSA) and Latent Dirichlet Allocation (LDA) have been applied and extensively compared on a movie database of two hundred thousand plots. Experiments are an important part of the paper; we examined the topic models behaviour based on standard metrics and on user evaluations, we have conducted performance assessments with 30 users to compare our approach with a commercial system. The outcome was that the performance of LSA was superior to that of LDA in supporting the selection of similar plots. Even if our system does not outperform commercial systems, it does not rely on human effort, thus it can be ported to any domain where natural language descriptions exist. Since it is independent from the number of user ratings, it is able to suggest famous movies as well as old or unheard movies that are still strongly related to the content of the video the user has watched.

[1]  Michael D. Lee,et al.  An Empirical Evaluation of Models of Text Document Similarity , 2005 .

[2]  Joemon M. Jose,et al.  Handling data sparsity in collaborative filtering using emotion and semantic based features , 2011, SIGIR.

[3]  Pabitra Mitra,et al.  Feature weighting in content based recommendation system using social network analysis , 2008, WWW.

[4]  Mehrbakhsh Nilashi,et al.  Collaborative filtering recommender systems , 2013 .

[5]  John Riedl,et al.  Learning preferences of new users in recommender systems: an information theoretic approach , 2008, SKDD.

[6]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[7]  Cataldo Musto,et al.  Enhanced vector space models for content-based recommender systems , 2010, RecSys '10.

[8]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[9]  Sonia Bergamaschi,et al.  A Non-intrusive Movie Recommendation System , 2012, OTM Conferences.

[10]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[11]  Sonia Bergamaschi,et al.  Schema label normalization for improving schema matching , 2010, Data Knowl. Eng..

[12]  Xin Jin,et al.  A Web recommendation system based on maximum entropy , 2005, International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume II.

[13]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Peter Wiemer-Hastings,et al.  Latent semantic analysis , 2004, Annu. Rev. Inf. Sci. Technol..

[15]  Peter J. Haas,et al.  Large-scale matrix factorization with distributed stochastic gradient descent , 2011, KDD.

[16]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[17]  Martha Larson,et al.  Mining contextual movie similarity with matrix factorization for context-aware recommendation , 2013, TIST.

[18]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Kotagiri Ramamohanarao,et al.  An analysis of latent semantic term self-correlation , 2009, TOIS.

[21]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[22]  Mark Steyvers,et al.  Topics in semantic representation. , 2007, Psychological review.

[23]  Guy Shani,et al.  A Survey of Accuracy Evaluation Metrics of Recommendation Tasks , 2009, J. Mach. Learn. Res..

[24]  Serena Sorrentino,et al.  Automatic generation of probabilistic relationships for improving schema matching , 2011, Inf. Syst..