Analysis and Evaluation of Similarity Metrics in Collaborative Filtering Recommender System

KEMI-TORNIO UNIVERSITY OF APPLIED SCIENCES Degree programme: Business Information Technology Writer: Guo, Shuhang Thesis title: Analysis and evaluation of similarity metrics in collaborative filtering recommender system Pages (of which appendix): 62 (1) Date: May 15, 2014 Thesis instructor: Ryabov, Vladimir This research is focused on the field of recommender systems. The general aims of this thesis are to summary the state-of-the-art in recommendation systems, evaluate the efficiency of the traditional similarity metrics with varies of data sets, and propose an ideology to model new similarity metrics. The literatures on recommender systems were studied for summarizing the current development in this filed. The implementation of the recommendation and evaluation was achieved by Apache Mahout which provides an open source platform of recommender engine. By importing data information into the project, a customized recommender engine was built. Since the recommending results of collaborative filtering recommender significantly rely on the choice of similarity metrics and the types of the data, several traditional similarity metrics provided in Apache Mahout were examined by the evaluator offered in the project with five data sets collected by some academy groups. From the evaluation, I found out that the best performance of each similarity metric was achieved by optimizing the adjustable parameters. The features of each similarity metric were obtained and analyzed with practical data sets. In addition, an ideology by combining two traditional metrics was proposed in the thesis and it was proven applicable and efficient by the metrics combination of Pearson correlation and Euclidean distance. The observation and evaluation of traditional similarity metrics with practical data is helpful to understand their features and suitability, from which new models can be created. Besides, the ideology proposed for modeling new similarity metrics can be found useful both theoretically and practically.

[1]  Pasquale Lops,et al.  Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[2]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[3]  Doubletree Hotel San Jose,et al.  The World's Most Popular Open Source Database , 2003 .

[4]  Jimmy J. Lin,et al.  WTF: the who to follow service at Twitter , 2013, WWW.

[5]  Li-Hua Li,et al.  Review of Recommender Systems and Their Application , 2011 .

[6]  Yu Gu,et al.  Watch global, cache local: YouTube network traffic at a campus network: measurements and implications , 2008, Electronic Imaging.

[7]  Mark Rosenstein,et al.  Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[8]  Michael J. Pazzani,et al.  Content-Based Recommendation Systems , 2007, The Adaptive Web.

[9]  John Riedl,et al.  Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[10]  Taghi M. Khoshgoftaar,et al.  A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[11]  Anna-Lan Huang,et al.  Similarity Measures for Text Document Clustering , 2008 .

[12]  Robin D. Burke,et al.  Hybrid Recommender Systems: Survey and Experiments , 2002, User Modeling and User-Adapted Interaction.

[13]  Dimitris Plexousakis,et al.  Alleviating the Sparsity Problem of Collaborative Filtering Using Trust Inferences , 2005, iTrust.

[14]  Jonathan L. Herlocker,et al.  Evaluating collaborative filtering recommender systems , 2004, TOIS.

[15]  Fernando Ortega,et al.  A collaborative filtering approach to mitigate the new user cold start problem , 2012, Knowl. Based Syst..

[16]  Xindong Wu,et al.  Lecture Notes in Machine Learning , 1994, Informatica.

[17]  G. Davies Bringing stores to shoppers – not shoppers to stores , 1995 .

[18]  Towards a Localized Version of Pearson's Correlation Coefficient , 2013 .

[19]  Vaclav Petricek,et al.  Recommender System for Online Dating Service , 2007, ArXiv.

[20]  Joshua Chang Online Shopping: Advantages over the Offline Alternative , 2003 .

[21]  John Riedl,et al.  Recommender systems in e-commerce , 1999, EC '99.

[22]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[23]  Josep Lluís de la Rosa i Esteva,et al.  A Taxonomy of Recommender Agents on the Internet , 2003, Artificial Intelligence Review.

[24]  Michael J. Pazzani,et al.  A Framework for Collaborative, Content-Based and Demographic Filtering , 1999, Artificial Intelligence Review.

[25]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[26]  Sean Owen,et al.  Collaborative Filtering with Apache Mahout , 2012 .

[27]  T. B. Ghosh,et al.  Freely available online information sources and their impact on libraries and information centres , 2002 .

[28]  Michael J. Pazzani,et al.  Syskill & Webert: Identifying Interesting Web Sites , 1996, AAAI/IAAI, Vol. 1.

[29]  Yu He,et al.  The YouTube video recommendation system , 2010, RecSys '10.

[30]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[31]  Dimitris Plexousakis,et al.  Qualitative analysis of user-based and item-based prediction algorithms for recommendation agents , 2005, Eng. Appl. Artif. Intell..

[32]  Arjen P. de Vries,et al.  Understanding Similarity Metrics in Neighbour-based Recommender Systems , 2013, ICTIR.

[33]  Bradley N. Miller,et al.  MovieLens unplugged: experiences with an occasionally connected recommender system , 2003, IUI '03.

[34]  Hsinchun Chen,et al.  Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering , 2004, TOIS.

[35]  Jöran Beel,et al.  A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation , 2013, RepSys '13.

[36]  A.P.J. van den Bosch,et al.  Collaborative and Content-based Filtering for Item Recommendation on Social Bookmarking Websites , 2009 .

[37]  Rosalind W. Picard Toward computers that recognize and respond to user emotion , 2000, IBM Syst. J..

[38]  Andreas Nürnberger,et al.  Research paper recommender system evaluation: a quantitative literature survey , 2013, RepSys '13.

[39]  Adam Prügel-Bennett,et al.  A Scalable, Accurate Hybrid Recommender System , 2010, 2010 Third International Conference on Knowledge Discovery and Data Mining.

[40]  Lin Qianhui,et al.  An Improved Similarity Algorithm for Personalized Recommendation , 2009, 2009 International Forum on Computer Science-Technology and Applications.

[41]  Lior Rokach,et al.  Introduction to Recommender Systems Handbook , 2011, Recommender Systems Handbook.

[42]  Pattie Maes,et al.  Social information filtering: algorithms for automating “word of mouth” , 1995, CHI '95.

[43]  John Riedl,et al.  GroupLens: an open architecture for collaborative filtering of netnews , 1994, CSCW '94.

[44]  Sean M. McNee,et al.  Improving recommendation lists through topic diversification , 2005, WWW '05.

[45]  Luis M. de Campos,et al.  Combining content-based and collaborative recommendations: A hybrid approach based on Bayesian networks , 2010, Int. J. Approx. Reason..

[46]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.