On Differences in the Tagging Behaviour of Spammers and Regular Users

In recent literature, several models have been proposed for reproducing and understanding the tagging behavior of regular users. Until now, they all have been evaluated by visually comparing their ability to reproduce characteristic properties found in tagging systems. This paper is the first which applies statistical methods for comparing the different tagging models and for measuring the statistical significance of the results. During our evaluation, we also show that spammers have a significant influence on the characteristic properties of tagging systems. This shows that they violate basic assumptions about regular user's tagging behavior and thus existing models need to be extended for taking the behavior of spammers into account.

[1]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[2]  C. Bauckhage,et al.  Analyzing Social Bookmarking Systems : A del . icio . us Cookbook , 2008 .

[3]  Andreas Hotho,et al.  The anti-social tagger: detecting spam in social bookmarking systems , 2008, AIRWeb '08.

[4]  Lahomtoires d'Electronique AN INFORMATIONAL THEORY OF THE STATISTICAL STRUCTURE OF LANGUAGE 36 , 2010 .

[5]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[6]  Michel L. Goldstein,et al.  Problems with fitting to the power-law distribution , 2004, cond-mat/0402322.

[7]  Vittorio Loreto,et al.  Semiotic dynamics and collaborative tagging , 2006, Proceedings of the National Academy of Sciences.

[8]  P. Gramme RANK for spam detection ECML-Discovery Challenge , 2008 .

[9]  Georgia Koutrika,et al.  Combating spam in tagging systems: An evaluation , 2008, TWEB.

[10]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[11]  Anestis Gkanogiannis,et al.  A novel supervised learning algorithm and its use for Spam Detection in Social Bookmarking Systems , 2008 .

[12]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[13]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[14]  Valentin Robu,et al.  The complex dynamics of collaborative tagging , 2007, WWW '07.

[15]  William H. Press,et al.  Numerical recipes in C , 2002 .

[16]  Vittorio Loreto,et al.  Vocabulary growth in collaborative tagging systems , 2007, ArXiv.