Detecting Linkedin Spammers and its Spam Nets

Spam is one of the main problems of the WWW. Many studies exist about characterising and detecting several types of Spam (mainly Web Spam, Email Spam, Forum/Blob Spam and Social Networking Spam). Nevertheless, to the best of our knowledge, there are no studies about the detection of Spam in Linkedin. In this article, we propose a method for detecting Spammers and Spam nets in the Linkedin social network. As there are no public or private Linkedin datasets in the state of the art, we have manually built a dataset of real Linkedin users, classifying them as Spammers or legitimate users. The proposed method for detecting Linkedin Spammers consists of a set of new heuristics and their combinations using a kNN classifier. Moreover, we proposed a method for detecting Spam nets (fake companies) in Linkedin, based on the idea that the profiles of these companies share content similarities. We have found that the proposed methods were very effective. We achieved an F-Measure of 0.971 and an AUC close to 1 in the detection of Spammer profiles, and in the detection of Spam nets, we have obtained an F-Measure of 1.

[1]  Junhui Wang,et al.  Detecting group review spam , 2011, WWW.

[2]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[3]  Brian D. Davison,et al.  Cloaking and Redirection: A Preliminary Study , 2005, AIRWeb.

[4]  Steven Myers,et al.  Prevalence and mitigation of forum spamming , 2011, 2011 Proceedings IEEE INFOCOM.

[5]  Jun Hu,et al.  Detecting and characterizing social spam campaigns , 2010, IMC '10.

[6]  Virgílio A. F. Almeida,et al.  Detecting Spammers on Twitter , 2010 .

[7]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[8]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[9]  Hector Garcia-Molina,et al.  Web Spam Taxonomy , 2005, AIRWeb.

[10]  Hector Garcia-Molina,et al.  Combating Web Spam with TrustRank , 2004, VLDB.

[11]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[12]  Paolo Boldi,et al.  Adversarial information retrieval in the web , 2007 .

[13]  Marc Najork,et al.  Detecting spam web pages through content analysis , 2006, WWW '06.

[14]  Charles Elkan,et al.  The Field Matching Problem: Algorithms and Applications , 1996, KDD.

[15]  Calton Pu,et al.  Characterizing Web Spam Using Content and HTTP Session Analysis , 2007, CEAS.

[16]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[17]  Kwang-Ting Cheng,et al.  Using visual features for anti-spam filtering , 2005, IEEE International Conference on Image Processing 2005.

[18]  Kyumin Lee,et al.  The social honeypot project: protecting online communities from spammers , 2010, WWW '10.

[19]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[20]  Calton Pu,et al.  Predicting web spam with HTTP session information , 2008, CIKM '08.

[21]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[22]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[23]  Gilad Mishne,et al.  Blocking Blog Spam with Language Model Disagreement , 2005, AIRWeb.

[24]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[25]  Virgílio A. F. Almeida,et al.  Video interactions in online video social networks , 2009, TOMCCAP.

[26]  Lars Backstrom,et al.  The Anatomy of the Facebook Social Graph , 2011, ArXiv.

[27]  Amanda Spink,et al.  An Analysis of Web Documents Retrieved and Viewed , 2003, International Conference on Internet Computing.

[28]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[29]  Danah Boyd,et al.  Detecting Spam in a Twitter Network , 2009, First Monday.

[30]  Geoff Hulten,et al.  Spamming botnets: signatures and characteristics , 2008, SIGCOMM '08.

[31]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[32]  Gianluca Stringhini,et al.  Detecting spammers on social networks , 2010, ACSAC '10.

[33]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[34]  Ee-Peng Lim,et al.  Detecting product review spammers using rating behaviors , 2010, CIKM.

[35]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[36]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[37]  Antonio Gulli,et al.  The indexable web is more than 11.5 billion pages , 2005, WWW '05.

[38]  William E. Winkler,et al.  String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .

[39]  Marc Najork Web Spam Detection , 2009, Encyclopedia of Database Systems.

[40]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[41]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[42]  Rajeev Motwani,et al.  Stratified Planning , 2009, IJCAI.