RichVSM: enRiched vector space models for folksonomies

People share millions of resources (photos, bookmarks, videos, etc.) in Folksonomies (like Flickr, Delicious, Youtube, etc.). To access and share resources, they add keywords called tags to the resources. As the tags are freely chosen keywords, it might not be possible for users to tag their resources with all the relevant tags. As a result, many resources lack sufficient number of relevant tags. The lack of relevant tags results into sparseness of data, and this sparseness of data makes many relevant resources unsearchable against user queries. In this paper, we explore two dimensions of semantic relationships between tags, based on the context and the distribution of tags. We exploit semantic relationships between tags to reduce sparseness in Folksonomies and propose different enriched vector space models. We also propose a vector space model Best of Breed which utilizes appropriate enrichment method based on the type of the query. We evaluate the proposed methods on a large dataset of 27 million resources, 92 thousand tags and 94 million tag assignments. Experimental results show that the enriched vector space models help in improving search, especially for the rare queries which have few relevant resources in the sparse data.

[1]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[2]  Julio Gonzalo,et al.  Overview of iCLEF 2008: Search Log Analysis for Multilingual Image Retrieval , 2008, CLEF.

[3]  Steffen Staab,et al.  Introducing Triple Play for Improved Resource Retrieval in Collaborative Tagging Systems , 2008 .

[4]  Julio Gonzalo,et al.  iCLEF 2006 Overview: Searching the Flickr WWW Photo-Sharing Repository , 2006, CLEF.

[5]  Steffen Staab,et al.  Exploiting Flickr Tags and Groups for Finding Landmark Photos , 2009, ECIR.

[6]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[7]  Chee Wee Leong,et al.  Exploiting Wikipedia for Directional Inferential Text Similarity , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[8]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..

[9]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[10]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[11]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[12]  Grigory Begelman,et al.  Automated Tag Clustering: Improving search and exploration in the tag space , 2006 .

[13]  Rui Li,et al.  Towards effective browsing of large scale social annotations , 2007, WWW '07.

[14]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[15]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[16]  Hongyuan Zha,et al.  Exploring social annotations for information retrieval , 2008, WWW.

[17]  Steffen Staab,et al.  Organizing Resources on Tagging Systems using TORG , 2007 .

[18]  Ciro Cattuto,et al.  Semantic Grounding of Tag Relatedness in Social Bookmarking Systems , 2008, SEMWEB.

[19]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .