The complex dynamics of collaborative tagging

The debate within the Web community over the optimal means by which to organize information often pits formalized classifications against distributed collaborative tagging systems. A number of questions remain unanswered, however, regarding the nature of collaborative tagging systems including whether coherent categorization schemes can emerge from unsupervised tagging by users. This paper uses data from the social bookmarking site delicio. us to examine the dynamics of collaborative tagging systems. In particular, we examine whether the distribution of the frequency of use of tags for "popular" sites with a long history (many tags and many users) can be described by a power law distribution, often characteristic of what are considered complex systems. We produce a generative model of collaborative tagging in order to understand the basic dynamics behind tagging, including how a power law distribution of tags could arise. We empirically examine the tagging history of sites in order to determine how this distribution arises over time and to determine the patterns prior to a stable distribution. Lastly, by focusing on the high-frequency tags of a site where the distribution of tags is a stabilized power law, we show how tag co-occurrence networks for a sample domain of tags can be used to analyze the meaning of particular tags given their relationship to other tags.

[1]  Ricard V. Solé,et al.  Least effort and the origins of scaling in human language , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Vladimir Batagelj,et al.  Pajek - Program for Large Network Analysis , 1999 .

[3]  Adam Mathes,et al.  Folksonomies-Cooperative Classification and Communication Through Shared Metadata , 2004 .

[4]  Algirdas Avizienis,et al.  Position Paper , 1994, EDCC.

[5]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[6]  M. Naaman,et al.  Position Paper, Tagging, Taxonomy, Flickr, Article, ToRead , 2006 .

[7]  Elin K. Jacob,et al.  Classification and Categorization: A Difference that Makes a Difference , 2004, Libr. Trends.

[8]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[9]  Béla Bollobás,et al.  Random Graphs , 1985 .

[10]  Jim Pitman,et al.  Riffle shuffles, cycles, and descents , 1995, Comb..

[11]  H. L. Poutré,et al.  Retrieving the Structure of Utility Graphs Used in Multi-Item Negotiation through Collaborative Filtering 1 , 2006 .

[12]  Ramon Ferrer i Cancho,et al.  The small world of human language , 2001, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[13]  Ricard Solé,et al.  Language: Syntax for free? , 2005, Nature.

[14]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[15]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[16]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[17]  Thomas J. Carter,et al.  An introduction to information theory and entropy , 2007 .

[18]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[19]  Lide Wu,et al.  Folksonomy as a Complex Network , 2005, ArXiv.