A sock puppet detection algorithm on virtual spaces

On virtual spaces, some individuals use multiple usernames or copycat/forge other users (usually called ''sock puppet'') to communicate with others. Those sock puppets are fake identities through which members of Internet community praise or create the illusion of support for the product or one's work, pretending to be a different person. A fundamental problem is how to identify these sock puppets. In this paper, we propose a sock puppet detection algorithm which combines authorship-identification techniques and link analysis. Firstly, we propose an interesting social network model in which links between two IDs are built if they have similar attitudes to most topics that both of them participate in; then, the edges are pruned according a hypothesis test, which consider the impact of their writing styles; finally, the link-based community detection for pruned network is performed. Compared to traditional methods, our approach has three advantages: (1) it conforms to the practical meanings of sock puppet community; (2) it can be applied in online situation; (3) it increases the efficiency of link analysis. In the experimental work, we evaluate our method using real datasets and compared our approach with several previous methods; the results have proved above advantages.

[1]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[2]  D. Holmes The Evolution of Stylometry in Humanities Scholarship , 1998 .

[3]  Jianping Zeng,et al.  A framework for WWW user activity analysis based on user interest , 2008, Knowl. Based Syst..

[4]  Rong Zheng,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006, J. Assoc. Inf. Sci. Technol..

[5]  A. Arenas,et al.  Community detection in complex networks using extremal optimization. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[6]  Jignesh M. Patel,et al.  Efficient aggregation for graph summarization , 2008, SIGMOD Conference.

[7]  Qiang Wang,et al.  Topic oriented community detection through social objects and link analysis in social networks , 2012, Knowl. Based Syst..

[8]  Vicenç Gómez,et al.  Statistical analysis of the social network and discussion threads in slashdot , 2008, WWW.

[9]  Patrick Juola,et al.  Authorship Attribution , 2008, Found. Trends Inf. Retr..

[10]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[11]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[12]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  Olivier de Vel,et al.  Mining E-mail Authorship , 2000 .

[15]  Hsinchun Chen,et al.  Visualizing Authorship for Identification , 2006, ISI.

[16]  Hsinchun Chen,et al.  A framework for authorship identification of online messages: Writing-style features and classification techniques , 2006 .

[17]  F. Mosteller,et al.  Inference and Disputed Authorship: The Federalist , 1966 .

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  I.N. Bozkurt,et al.  Authorship attribution , 2007, 2007 22nd international symposium on computer and information sciences.

[20]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[22]  Frederick Mosteller,et al.  Applied Bayesian and classical inference : the case of the Federalist papers , 1984 .

[23]  H. T. Eddy The characteristic curves of composition. , 1887, Science.

[24]  Hsinchun Chen,et al.  Applying Authorship Analysis to Arabic Web Content , 2005, ISI.

[25]  M. Newman,et al.  Finding community structure in very large networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[27]  Richard P. Lippmann,et al.  An introduction to computing with neural nets , 1987 .

[28]  Adil Joldic,et al.  A more comprehensive activity analysis of standard online social networking functionalities , 2010, 2010 2nd International Conference on Software Technology and Engineering.

[29]  R. Harald Baayen,et al.  How Variable May a Constant be? Measures of Lexical Richness in Perspective , 1998, Comput. Humanit..

[30]  Mohammed Arif,et al.  Online Social Networks - An interface requirements analysis , 2009, 2009 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[31]  ZhengYou Xia,et al.  Community detection based on a semantic network , 2012, Knowl. Based Syst..

[32]  David I. Holmes,et al.  Feature-Finding for Text Classification , 1996 .

[33]  Bin Wu,et al.  Community detection in large-scale social networks , 2007, WebKDD/SNA-KDD '07.

[34]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.