Discovering and exploiting semantics in folksonomies

Folksonomies are Web 2.0 platforms where users share resources with each other. Furthermore, they can assign keywords (called tags) to the resources for categorizing and organizing the resources. Numerous types of resources like websites (Delicious), images (Flickr), and videos (YouTube) are supported by different folksonomies. The folksonomies are easy to use and thus attract the attention of millions of users. Together with the ease they offer, there are also some problems. This thesis addresses different problems of folksonomies and proposes solutions for these problems. The first problem occurs when users search for relevant resources in folksonomies. Often, the users are not able to find all relevant resources because they don't know which tags are relevant. The second problem is assigning tags to resources. Although many folksonomies (like Delicious) recommend tags for the resources, other folksonomies (like Flickr) do not recommend any tags. Tag recommendation helps the users to easily tag their resources. The third problem is that tags and resources are lacking semantics. This leads for example to ambiguous tags. The tags are lacking semantics because they are freely chosen keywords. The automatic identification of the semantics of tags and resources helps in reducing problems that arise from this freedom of the users in choosing the tags. This thesis proposes methods which exploit semantics to address the problems of search, tag recommendation, and the identification of tag semantics. The semantics are discovered from a variety of sources. In this thesis, we exploit web search engines, online social communities and the co-occurrences of tags as sources of semantics. Using different sources for discovering semantics reduces the efforts to build systems which solve the problems mentioned earlier. This thesis evaluates the proposed methods on a large scale data set. The evaluation results suggest that it is possible to exploit the semantics for improving search, recommendation of tags, and automatic identification of the semantics of tags and resources.

[1]  Andreas Hotho,et al.  Information Retrieval in Folksonomies: Search and Ranking , 2006, ESWC.

[2]  Seong-Bae Park,et al.  An automatic translation of tags for multimedia contents using folksonomy networks , 2009, SIGIR.

[3]  Douglas B. Terry,et al.  Using collaborative filtering to weave an information tapestry , 1992, CACM.

[4]  Anne-Marie Kermarrec,et al.  Toward personalized query expansion , 2009, SNS '09.

[5]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[6]  Mor Naaman,et al.  How flickr helps us make sense of the world: context and content in community-contributed media collections , 2007, ACM Multimedia.

[7]  Roelof van Zwol,et al.  Flickr tag recommendation based on collective knowledge , 2008, WWW.

[8]  Ruixuan Li,et al.  Semantic Grounding of Hybridization for Tag Recommendation , 2010, WAIM.

[9]  Roelof van Zwol,et al.  Classifying tags using open content resources , 2009, WSDM '09.

[10]  M. Lipczak,et al.  Tag Recommendation for Folksonomies Oriented towards Individual Users , 2008 .

[11]  Steffen Staab,et al.  Introducing Triple Play for Improved Resource Retrieval in Collaborative Tagging Systems , 2008 .

[12]  Andrea Esuli,et al.  CoPhIR: a Test Collection for Content-Based Image Retrieval , 2009, ArXiv.

[13]  Rabeeh Abbasi,et al.  Query Expansion in Folksonomies , 2010, SAMT.

[14]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[15]  Ralf Krestel,et al.  Latent dirichlet allocation for tag recommendation , 2009, RecSys '09.

[16]  Oded Nov,et al.  What drives content tagging: the case of photos on Flickr , 2008, CHI.

[17]  Enrico Motta,et al.  Improving Folksonomies Using Formal Knowledge: A Case Study on Search , 2009, ASWC.

[18]  Andreas Hotho,et al.  Mining Association Rules in Folksonomies , 2006, Data Science and Classification.

[19]  Yun Peng,et al.  Swoogle: A semantic web search and metadata engine , 2004, CIKM 2004.

[20]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[21]  Peter Mika,et al.  Ontologies are us: A unified model of social networks and semantics , 2005, J. Web Semant..

[22]  Malik Yousef,et al.  One-Class SVMs for Document Classification , 2002, J. Mach. Learn. Res..

[23]  Antoine Geissbühler,et al.  A Review of Content{Based Image Retrieval Systems in Medical Applications { Clinical Bene(cid:12)ts and Future Directions , 2022 .

[24]  Abdur Chowdhury,et al.  A picture of search , 2006, InfoScale '06.

[25]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[26]  Mor Naaman,et al.  Generating summaries and visualization for large collections of geo-referenced photographs , 2006, MIR '06.

[27]  Hugh E. Williams,et al.  Query expansion using associated queries , 2003, CIKM '03.

[28]  Andreas Hotho,et al.  Testing and evaluating tag recommenders in a live system , 2009, RecSys '09.

[29]  B. S. Manjunath,et al.  Introduction to MPEG-7: Multimedia Content Description Interface , 2002 .

[30]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[31]  Andreas Hotho,et al.  Tag Recommendations in Folksonomies , 2007, LWA.

[32]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[33]  Ciro Cattuto,et al.  Evaluating similarity measures for emergent semantics of social tagging , 2009, WWW '09.

[34]  Guillaume Pitel,et al.  Image clustering based on a shared nearest neighbors approach for tagged collections , 2008, CIVR '08.

[35]  Steffen Staab,et al.  Large Scale Tag Recommendation Using Different Image Representations , 2009, SAMT.

[36]  B. S. Manjunath,et al.  Spirittagger: a geo-aware tag suggestion tool mined from flickr , 2008, MIR '08.

[37]  Steffen Staab,et al.  Organizing Resources on Tagging Systems using TORG , 2007 .

[38]  Laks V. S. Lakshmanan,et al.  Efficient network aware search in collaborative tagging sites , 2008, Proc. VLDB Endow..

[39]  Jaime G. Carbonell,et al.  Document Representation and Query Expansion Models for Blog Recommendation , 2008, ICWSM.

[40]  Steffen Staab,et al.  RichVSM: enRiched vector space models for folksonomies , 2009, HT '09.

[41]  P. Schmitz,et al.  Inducing Ontology from Flickr Tags , 2006 .

[42]  Steffen Staab,et al.  Exploiting Flickr Tags and Groups for Finding Landmark Photos , 2009, ECIR.

[43]  Andreas Hotho,et al.  A Comparison of Content-Based Tag Recommendations in Folksonomy Systems , 2007, KONT/KPP.

[44]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[45]  Steffen Staab,et al.  Towards the self-annotating web , 2004, WWW '04.

[46]  Isabella Peters,et al.  Folksonomies - Indexing and Retrieval in Web 2.0 , 2009, Knowledge and Information.

[47]  Mor Naaman,et al.  World explorer: visualizing aggregate data from unstructured text in geo-referenced collections , 2007, JCDL '07.

[48]  Hector Garcia-Molina,et al.  Social tag prediction , 2008, SIGIR '08.

[49]  Lars Schmidt-Thieme,et al.  Collaborative Tag Recommendations , 2007, GfKl.

[50]  Mor Naaman,et al.  Why we tag: motivations for annotation in mobile and online media , 2007, CHI.

[51]  Oded Nov,et al.  Analysis of participation in an online photo-sharing community: A multidimensional perspective , 2010, J. Assoc. Inf. Sci. Technol..

[52]  Steffen Staab,et al.  Gimme' the context: context-driven automatic semantic annotation with C-PANKOW , 2005, WWW '05.

[53]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[54]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[55]  Steffen Staab,et al.  SemaPlorer - Interactive semantic exploration of data and media based on a federated cloud infrastructure , 2009, J. Web Semant..

[56]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[57]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[58]  Tamara L. Berg,et al.  Automatic Ranking of Iconic Images , 2007 .

[59]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[60]  Jeff Z. Pan,et al.  Reducing Ambiguity in Tagging Systems with Folksonomy Search Expansion , 2009, ESWC.

[61]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[62]  Alessandro Perina,et al.  Content visualization and management of geo-located image databases , 2008, CHI Extended Abstracts.

[63]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[64]  Adrian Popescu,et al.  Gazetiki: automatic creation of a geographical gazetteer , 2008, JCDL '08.

[65]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[66]  Peter Mika Ontologies Are Us: A Unified Model of Social Networks and Semantics , 2005, International Semantic Web Conference.

[67]  L. Sauermann,et al.  ConTag : A Semantic Tag Recommendation System , 2007 .

[68]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[69]  Adam Rae,et al.  Improving tag recommendation using social networks , 2010, RIAO.