Instanced-Based Mapping between Thesauri and Folksonomies

The emergence of web based systems in which users can annotate items, raises the question of the semantic interoperability between vocabularies originating from collaborative annotation processes, often called folksonomies, and keywords assigned in a more traditional way. If collections are annotated according to two systems, e.g. with tags and keywords, the annotated data can be used for instance based mapping between the vocabularies. The basis for this kind of matching is an appropriate similarity measure between concepts, based on their distribution as annotations. In this paper we propose a new similarity measure that can take advantage of some special properties of user generated metadata. We have evaluated this measure with a set of articles from Wikipedia which are both classified according to the topic structure of Wikipedia and annotated by users of the bookmarking service del.icio.us. The results using the new measure are significantly better than those obtained using standard similarity measures proposed for this task in the literature, i.e., it correlates better with human judgments. We argue that the measure also has benefits for instance based mapping of more traditionally developed vocabularies.

[1]  Ethan V. Munson Symposium on document engineering , 2002, SIGF.

[2]  Stefan Schlobach,et al.  An Empirical Study of Instance-Based Ontology Matching , 2007, ISWC/ASWC.

[3]  Gerd Stumme,et al.  FCA-MERGE: Bottom-Up Merging of Ontologies , 2001, IJCAI.

[4]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[5]  Véronique Malaisé,et al.  Disambiguating automatic semantic annotation based on a thesaurus structure , 2007 .

[6]  Valentin Robu,et al.  The complex dynamics of collaborative tagging , 2007, WWW '07.

[7]  Christian Wartena,et al.  Topic Detection by Clustering Keywords , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[8]  Mark Melenhorst,et al.  Tag-based information retrieval for educational videos , 2005 .

[9]  Hang Li,et al.  Topic Analysis Using a Finite Mixture Model , 2000, Inf. Process. Manag..

[10]  Simone Paolo Ponzetto,et al.  Deriving a Large-Scale Taxonomy from Wikipedia , 2007, AAAI.

[11]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[12]  Andreas Hotho,et al.  BibSonomy: a social bookmark and publication sharing system , 2006 .

[13]  Christoph Meinel,et al.  Authors vs. readers: a comparative study of document metadata and content in the www , 2007, DocEng '07.

[14]  Bernardo A. Huberman,et al.  The Structure of Collaborative Tagging Systems , 2005, ArXiv.

[15]  Mathias Lux,et al.  Aspects of Broad Folksonomies , 2007, 18th International Workshop on Database and Expert Systems Applications (DEXA 2007).