Finding the right term: Retrieving and exploring semantic concepts in astronomical vocabularies

Astronomy, like many domains, already has several sets of terminology in general use, referred to as controlled vocabularies. For example, the keywords for tagging journal articles, or the taxonomy of terms used to label image files. These existing vocabularies can be encoded into skos, a W3C proposed recommendation for representing vocabularies on the Semantic Web, so that computer systems can help users to search for and discover resources tagged with vocabulary concepts. However, this requires a search mechanism to go from a user-supplied string to a vocabulary concept. In this paper, we present our experiences in implementing the Vocabulary Explorer, a vocabulary search service based on the Terrier Information Retrieval Platform. We investigate the capabilities of existing document weighting models for identifying the correct vocabulary concept for a query. Due to the highly structured nature of a skos encoded vocabulary, we investigate the effects of term weighting (boosting the score of concepts that match on particular fields of a vocabulary concept), and query expansion. We found that the existing document weighting models provided very high quality results, but these could be improved further with the use of term weighting that makes use of the semantic evidence.

[1]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[2]  Iadh Ounis,et al.  Research directions in Terrier: a search engine for advanced retrieval on the Web , 2007 .

[3]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.

[4]  Douglas Tudhope,et al.  Using Terminology Web Services for the Archaeological Domain , 2008, ECDL.

[5]  Siegfried Handschuh,et al.  P-TAG: large scale automatic generation of personalized annotation tags for the web , 2007, WWW '07.

[6]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[7]  Eero Hyvönen,et al.  ONKI SKOS Server for Publishing and Utilizing SKOS Vocabularies and Ontologies as Services , 2009, ESWC.

[8]  Gianni Amati,et al.  Probability models for information retrieval based on divergence from randomness , 2003 .

[9]  T. V. Geetha,et al.  Semantics Based Information Retrieval Using Conceptual Indexing of Documents , 2003, IDEAL.

[10]  Raya Fidel,et al.  Searchers' selection of search keys: II. Controlled vocabulary or free‐text searching , 1991 .

[11]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[12]  Elaine Svenonius,et al.  Unanswered questions in the design of controlled vocabularies , 1986, J. Am. Soc. Inf. Sci..

[13]  C. J. van Rijsbergen,et al.  Information Retrieval , 1979, Encyclopedia of GIS.

[14]  Ian Horrocks,et al.  OWL Web Ontology Language Reference-W3C Recommen-dation , 2004 .

[15]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[16]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[17]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[18]  Christopher G. Chute,et al.  Implementation Brief: LexGrid: A Framework for Representing, Storing, and Querying Biomedical Terminologies from Simple to Sublime , 2009, J. Am. Medical Informatics Assoc..

[19]  David Hawking,et al.  Overview of the TREC-9 Web Track , 2000, TREC.