Latent dirichlet allocation for tag recommendation

Tagging systems have become major infrastructures on the Web. They allow users to create tags that annotate and categorize content and share them with other users, very helpful in particular for searching multimedia content. However, as tagging is not constrained by a controlled vocabulary and annotation guidelines, tags tend to be noisy and sparse. Especially new resources annotated by only a few users have often rather idiosyncratic tags that do not reflect a common perspective useful for search. In this paper we introduce an approach based on Latent Dirichlet Allocation (LDA) for recommending tags of resources in order to improve search. Resources annotated by many users and thus equipped with a fairly stable and complete tag set are used to elicit latent topics to which new resources with only a few tags are mapped. Based on this, other tags belonging to a topic can be recommended for the new resource. Our evaluation shows that the approach achieves significantly better precision and recall than the use of association rules, suggested in previous work, and also recommends more specific tags. Moreover, extending resources with these recommended tags significantly improves search for new resources.

[1]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[2]  Panagiotis Symeonidis,et al.  Tag recommendations based on tensor dimensionality reduction , 2008, RecSys '08.

[3]  Siegfried Handschuh,et al.  P-TAG: large scale automatic generation of personalized annotation tags for the web , 2007, WWW '07.

[4]  Ingmar Weber,et al.  Personalized, interactive tag recommendation for flickr , 2008, RecSys '08.

[5]  Vladimir Batagelj,et al.  Generalized Cores , 2002, ArXiv.

[6]  Marcus Fontoura,et al.  Using annotations in enterprise search , 2006, WWW '06.

[7]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[8]  James Ze Wang,et al.  Toward Bridging the Annotation-Retrieval Gap in Image Search , 2007, IEEE MultiMedia.

[9]  Ralf Krestel,et al.  The Art of Tagging: Measuring the Quality of Tags , 2008, ASWC.

[10]  Yong Yu,et al.  Optimizing web search using social annotations , 2007, WWW '07.

[11]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[12]  Gerhard Weikum,et al.  Efficient top-k querying over social-tagging networks , 2008, SIGIR '08.

[13]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[14]  Yang Song,et al.  Real-time automatic tag recommendation , 2008, SIGIR '08.

[15]  Jácint Szabó,et al.  Linked latent Dirichlet allocation in web spam filtering , 2009, AIRWeb '09.

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Wolfgang Nejdl,et al.  Can all tags be used for search? , 2008, CIKM '08.

[19]  Bamshad Mobasher,et al.  Personalized recommendation in social tagging systems using hierarchical clustering , 2008, RecSys '08.

[20]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Bernardo A. Huberman,et al.  Usage patterns of collaborative tagging systems , 2006, J. Inf. Sci..

[22]  Lise Getoor,et al.  A Latent Dirichlet Model for Unsupervised Entity Resolution , 2005, SDM.

[23]  Gilad Mishne,et al.  AutoTag: a collaborative approach to automated tag assignment for weblog posts , 2006, WWW '06.

[24]  Georgia Koutrika,et al.  Can social bookmarking improve web search? , 2008, WSDM '08.

[25]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[26]  Ingo Mierswa,et al.  YALE: rapid prototyping for complex data mining tasks , 2006, KDD '06.

[27]  Jianchang Mao,et al.  Towards the Semantic Web: Collaborative Tag Suggestions , 2006 .

[28]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[29]  Bettina Berendt,et al.  Tags are not metadata, but "just more content" - to some people , 2007, ICWSM.

[30]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.