Probabilistic Generative Models of the Social Annotation Process

With the growth in the past few years of social tagging services like Delicious and CiteULike, there is growing interest in modeling and mining these social systems for deriving implicit social collective intelligence. In this paper, we propose and explore two probabilistic generative models of the social annotation (or tagging) process with an emphasis on user participation. These models leverage the inherent social communities implicit in these tagging services. We compare the proposed models to two prominent probabilistic topic models (Latent Dirichlet Allocation and Pachinko Allocation) via an experimental study of the popular Delicious tagging service. We find that the proposed community-based annotation models identify more coherent implicit structures than the alternatives and are better suited to handle unseen social annotation data.

[1]  Christopher H. Brooks,et al.  Improved annotation of the blogosphere via autotagging and hierarchical clustering , 2006, WWW '06.

[2]  Said Kashoob,et al.  A Categorical Model for Discovering Latent Structure in Social Annotations , 2009, ICWSM.

[3]  Gregor Heinrich Parameter estimation for text analysis , 2009 .

[4]  Vittorio Loreto,et al.  Vocabulary growth in collaborative tagging systems , 2007, ArXiv.

[5]  Satoshi Nakamura,et al.  Can social bookmarking enhance search in the web? , 2007, JCDL '07.

[6]  Rui Li,et al.  Towards effective browsing of large scale social annotations , 2007, WWW '07.

[7]  Csaba Veres,et al.  The Language of Folksonomies: What Tags Reveal About User Classification , 2006, NLDB.

[8]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[9]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[10]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  Ulrik Brandes,et al.  Network analysis of collaboration structure in Wikipedia , 2009, WWW '09.

[13]  Andrew McCallum,et al.  Topic and Role Discovery in Social Networks , 2005, IJCAI.

[14]  Valentin Robu,et al.  The complex dynamics of collaborative tagging , 2007, WWW '07.

[15]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[16]  Yong Yu,et al.  Exploring social annotations for the semantic web , 2006, WWW '06.

[17]  Xin Li,et al.  Tag-based social interest discovery , 2008, WWW.

[18]  Kristina Lerman,et al.  Modeling Social Annotation: A Bayesian Approach , 2008, TKDD.

[19]  Hector Garcia-Molina,et al.  Clustering the tagged web , 2009, WSDM '09.

[20]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[21]  Christos Faloutsos,et al.  Modeling Blog Dynamics , 2009, ICWSM.

[22]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[23]  Kristina Lerman,et al.  Exploiting Social Annotation for Automatic Resource Discovery , 2007, ArXiv.

[24]  Yong Yu,et al.  Exploring folksonomy for personalized search , 2008, SIGIR '08.

[25]  Mor Naaman,et al.  Less talk, more rock: automated organization of community-contributed collections of concert videos , 2009, WWW '09.

[26]  Vittorio Loreto,et al.  Collaborative Tagging and Semiotic Dynamics , 2006, ArXiv.

[27]  Jon M. Kleinberg,et al.  Mapping the world's photos , 2009, WWW '09.

[28]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[29]  Hongyuan Zha,et al.  Exploring social annotations for information retrieval , 2008, WWW.