Shilling attack detection utilizing semi-supervised learning method for collaborative recommender system

Collaborative filtering (CF) technique is capable of generating personalized recommendations. However, the recommender systems utilizing CF as their key algorithms are vulnerable to shilling attacks which insert malicious user profiles into the systems to push or nuke the reputations of targeted items. There are only a small number of labeled users in most of the practical recommender systems, while a large number of users are unlabeled because it is expensive to obtain their identities. In this paper, Semi-SAD, a new semi-supervised learning based shilling attack detection algorithm is proposed to take advantage of both types of data. It first trains a naïve Bayes classifier on a small set of labeled users, and then incorporates unlabeled users with EM-λ to improve the initial naïve Bayes classifier. Experiments on MovieLens datasets are implemented to compare the efficiency of Semi-SAD with supervised learning based detector and unsupervised learning based detector. The results indicate that Semi-SAD can better detect various kinds of shilling attacks than others, especially against obfuscated and hybrid shilling attacks.

[1]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[2]  Jong-Seok Lee,et al.  Shilling Attack Detection - A New Approach for a Trustworthy Recommender System , 2012, INFORMS J. Comput..

[3]  Vittorio Castelli,et al.  On the exponential value of labeled samples , 1995, Pattern Recognit. Lett..

[4]  Hui Xiong,et al.  COG: local decomposition for rare class analysis , 2010, Data Mining and Knowledge Discovery.

[5]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[6]  Wolfgang Nejdl,et al.  Preventing shilling attacks in online recommender systems , 2005, WIDM '05.

[7]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[8]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[9]  Christopher Meek,et al.  A unified approach to building hybrid recommender systems , 2009, RecSys '09.

[10]  John Riedl,et al.  Shilling recommender systems for fun and profit , 2004, WWW '04.

[11]  David A. Landgrebe,et al.  The effect of unlabeled samples in reducing the small sample size problem and mitigating the Hughes phenomenon , 1994, IEEE Trans. Geosci. Remote. Sens..

[12]  Bamshad Mobasher,et al.  Classification features for attack detection in collaborative recommender systems , 2006, KDD '06.

[13]  Korris Fu-Lai Chung,et al.  A probabilistic rating inference framework for mining user preferences from reviews , 2011, World Wide Web.

[14]  Nikos Manouselis,et al.  Analysis and Classification of Multi-Criteria Recommender Systems , 2007, World Wide Web.

[15]  Bamshad Mobasher,et al.  Profile Injection Attack Detection for Securing Collaborative Recommender Systems 1 , 2006 .

[16]  Philip S. Yu,et al.  Exploring latent browsing graph for question answering recommendation , 2011, World Wide Web.

[17]  Jie Cao,et al.  Semi-SAD: applying semi-supervised learning to shilling attack detection , 2011, RecSys '11.

[18]  Bamshad Mobasher,et al.  Analysis and Detection of Segment-Focused Attacks Against Collaborative Recommendation , 2005, WEBKDD.

[19]  Zibin Zheng,et al.  WSRec: A Collaborative Filtering Based Web Service Recommender System , 2009, 2009 IEEE International Conference on Web Services.

[20]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Yehuda Koren,et al.  Improved Neighborhood-based Collaborative Filtering , 2007 .

[22]  Philip S. Yu,et al.  Leadership discovery when data correlatively evolve , 2010, World Wide Web.

[23]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[24]  Zunping Cheng,et al.  Statistical attack detection , 2009, RecSys '09.

[25]  Thomas Hofmann,et al.  Lies and propaganda: detecting spam users in collaborative filtering , 2007, IUI '07.