Semantic stability in social tagging streams

One potential disadvantage of social tagging systems is that due to the lack of a centralized vocabulary, a crowd of users may never manage to reach a consensus on the description of resources (e.g., books, users or songs) on the Web. Yet, previous research has provided interesting evidence that the tag distributions of resources may become semantically stable over time as more and more users tag them. At the same time, previous work has raised an array of new questions such as: (i) How can we assess the semantic stability of social tagging systems in a robust and methodical way? (ii) Does semantic stabilization of tags vary across different social tagging systems and ultimately, (iii) what are the factors that can explain semantic stabilization in such systems? In this work we tackle these questions by (i) presenting a novel and robust method which overcomes a number of limitations in existing methods, (ii) empirically investigating semantic stabilization processes in a wide range of social tagging systems with distinct domains and properties and (iii) detecting potential causes for semantic stabilization, specifically imitation behavior, shared background knowledge and intrinsic properties of natural language. Our results show that tagging streams which are generated by a combination of imitation dynamics and shared background knowledge exhibit faster and higher semantic stability than tagging streams which are generated via imitation dynamics or natural language phenomena alone.

[1]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[2]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[3]  Andreas Hotho,et al.  Mining Association Rules in Folksonomies , 2006, Data Science and Classification.

[4]  G. Yule,et al.  A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[5]  C. Cattuto Semiotic dynamics in online social communities , 2006 .

[6]  Arkaitz Zubiaga,et al.  Tags vs shelves: from social tagging to social classification , 2011, HT '11.

[7]  George Macgregor,et al.  Collaborative tagging as a knowledge organisation and resource discovery tool , 2006 .

[8]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[9]  John Riedl,et al.  tagging, communities, vocabulary, evolution , 2006, CSCW '06.

[10]  Christopher T. Kello,et al.  Scaling Laws in Cognitive Science , 2009 .

[11]  Harry Halpin,et al.  The role of tag suggestions in folksonomies , 2009, HT '09.

[12]  Bing He,et al.  The dynamic features of Delicious, Flickr, and YouTube , 2011, J. Assoc. Inf. Sci. Technol..

[13]  Christopher T. Kello,et al.  Scaling laws in cognitive sciences , 2010, Trends in Cognitive Sciences.

[14]  Wentian Li,et al.  Random texts exhibit Zipf's-law-like word frequency distribution , 1992, IEEE Trans. Inf. Theory.

[15]  Valentin Robu,et al.  The complex dynamics of collaborative tagging , 2007, WWW '07.

[16]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[17]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[18]  Margaret E. I. Kipp,et al.  Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices , 2007, ASIST.

[19]  Omer Tripp,et al.  Zipf ’ s Law Revisited , 2007 .

[20]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[21]  Steffen Staab,et al.  PINTS: peer-to-peer infrastructure for tagging systems , 2008, IPTPS.

[22]  Vittorio Loreto,et al.  Semiotic dynamics and collaborative tagging , 2006, Proceedings of the National Academy of Sciences.

[23]  Marcelo A. Montemurro,et al.  Frequency-rank distribution of words in large text samples: phenomenology and models , 2002, Glottometrics.

[24]  Alistair Moffat,et al.  A similarity measure for indefinite rankings , 2010, TOIS.

[25]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[26]  Enrico Motta,et al.  Integrating Folksonomies with the Semantic Web , 2007, ESWC.

[27]  Dietmar Plenz,et al.  powerlaw: A Python Package for Analysis of Heavy-Tailed Distributions , 2013, PloS one.

[28]  Luc Steels,et al.  Semiotic Dynamics for Embodied Agents , 2006, IEEE Intelligent Systems.

[29]  Ramon Ferrer-i-Cancho,et al.  Random Texts Do Not Exhibit the Real Zipf's Law-Like Rank Distribution , 2010, PloS one.

[30]  G. A. Miller,et al.  Finitary models of language users , 1963 .

[31]  Adam Mathes,et al.  Folksonomies-Cooperative Classification and Communication Through Shared Metadata , 2004 .

[32]  Claudia Wagner,et al.  Religious Politicians and Creative Photographers: Automatic User Categorization in Twitter , 2013, 2013 International Conference on Social Computing.

[33]  Dominik Benz,et al.  Stop thinking, start tagging: tag semantics emerge from collaborative verbosity , 2010, WWW '10.

[34]  Wai-Tat Fu,et al.  Semantic imitation in social tagging , 2010, TCHI.

[35]  Rosario N. Mantegna,et al.  Numerical Analysis of Word Frequencies in Artificial and Natural Language Texts , 1997 .

[36]  Peter M. Todd,et al.  Can simple social copying heuristics explain tag popularity in a collaborative tagging system? , 2013, WebSci.

[37]  G. Yule,et al.  A Mathematical Theory of Evolution Based on the Conclusions of Dr. J. C. Willis, F.R.S. , 1925 .

[38]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.