论文信息 - Analysis and Evaluation of Similarity Metrics in Collaborative Filtering Recommender System

Analysis and Evaluation of Similarity Metrics in Collaborative Filtering Recommender System

KEMI-TORNIO UNIVERSITY OF APPLIED SCIENCES Degree programme: Business Information Technology Writer: Guo, Shuhang Thesis title: Analysis and evaluation of similarity metrics in collaborative filtering recommender system Pages (of which appendix): 62 (1) Date: May 15, 2014 Thesis instructor: Ryabov, Vladimir This research is focused on the field of recommender systems. The general aims of this thesis are to summary the state-of-the-art in recommendation systems, evaluate the efficiency of the traditional similarity metrics with varies of data sets, and propose an ideology to model new similarity metrics. The literatures on recommender systems were studied for summarizing the current development in this filed. The implementation of the recommendation and evaluation was achieved by Apache Mahout which provides an open source platform of recommender engine. By importing data information into the project, a customized recommender engine was built. Since the recommending results of collaborative filtering recommender significantly rely on the choice of similarity metrics and the types of the data, several traditional similarity metrics provided in Apache Mahout were examined by the evaluator offered in the project with five data sets collected by some academy groups. From the evaluation, I found out that the best performance of each similarity metric was achieved by optimizing the adjustable parameters. The features of each similarity metric were obtained and analyzed with practical data sets. In addition, an ideology by combining two traditional metrics was proposed in the thesis and it was proven applicable and efficient by the metrics combination of Pearson correlation and Euclidean distance. The observation and evaluation of traditional similarity metrics with practical data is helpful to understand their features and suitability, from which new models can be created. Besides, the ideology proposed for modeling new similarity metrics can be found useful both theoretically and practically.

Shuhang Guo | Shuhang Guo

[1] Pasquale Lops,et al. Content-based Recommender Systems: State of the Art and Trends , 2011, Recommender Systems Handbook.

[2] Pablo Rodriguez,et al. I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[3] Doubletree Hotel San Jose,et al. The World's Most Popular Open Source Database , 2003 .

[4] Jimmy J. Lin,et al. WTF: the who to follow service at Twitter , 2013, WWW.

[5] Li-Hua Li,et al. Review of Recommender Systems and Their Application , 2011 .

[6] Yu Gu,et al. Watch global, cache local: YouTube network traffic at a campus network: measurements and implications , 2008, Electronic Imaging.

[7] Mark Rosenstein,et al. Recommending and evaluating choices in a virtual community of use , 1995, CHI '95.

[8] Michael J. Pazzani,et al. Content-Based Recommendation Systems , 2007, The Adaptive Web.

[9] John Riedl,et al. Item-based collaborative filtering recommendation algorithms , 2001, WWW '01.

[10] Taghi M. Khoshgoftaar,et al. A Survey of Collaborative Filtering Techniques , 2009, Adv. Artif. Intell..

[11] Anna-Lan Huang,et al. Similarity Measures for Text Document Clustering , 2008 .