Ontologies and Query expansion

This master thesis will explore the use of ontologies in information retrieval and in query expansion in particular. Ontologies are usually huge, hand-coded repositories of concepts and relations between them so using them in information retrieval seems to be a reasonable goal. We feel that the use of ontologies for query expansion in particular has been overlooked in contemporary literature, as the main related papers date before 2000. In this thesis we will attempt to present a query expansion method using ontologies that outperforms non-ontological query expansion methods. Note, however, that the presented approach is not purely ontological but is rather a hybrid approach as it uses non-ontological steps. We also propose a method for purely probabilistic query expansion that outperforms all methods tested. Finally we explore word sense disambiguation based on ontologies as that is a prerequisite step for ontological query expansion. The ontology used was WordNet. The results of our experiments were based on standard TREC conferences data and showed that an ontological approach can cause improvement over non-ontological methods.

[1]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[2]  John Tait,et al.  Word sense disambiguation in information retrieval revisited , 2003, SIGIR.

[3]  Douglas Herrmann,et al.  A Taxonomy of Part-Whole Relations , 1987, Cogn. Sci..

[4]  Mark Sanderson,et al.  Word sense disambiguation and information retrieval , 1994, SIGIR '94.

[5]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[6]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[7]  Wei-Ying Ma,et al.  Probabilistic query expansion using query logs , 2002, WWW '02.

[8]  Jianhua Dong,et al.  Ad Hoc Experiments Using EUREKA , 1996, TREC.

[9]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[10]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[11]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[12]  Riao RIAO 94 Conference proceedings : Intelligent Multimedia Information Retrieval Systems and Management, Rockefeller University, New York, October 11-13, 1994 , 1994 .

[13]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[14]  Roberto Navigli,et al.  An analysis of ontology-based query expansion strategies , 2003 .

[15]  Thomas R. Gruber,et al.  Toward principles for the design of ontologies used for knowledge sharing? , 1995, Int. J. Hum. Comput. Stud..

[16]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[17]  BuckleyChris,et al.  Using clustering and SuperConcepts within SMART , 2000 .

[18]  W. Bruce Croft,et al.  An Association Thesaurus for Information Retrieval , 1994, RIAO.

[19]  Claudio Carpineto,et al.  An information-theoretic approach to automatic query expansion , 2001, TOIS.

[20]  Udo Kruschwitz,et al.  Users want more sophisticated search assistants: Results of a task-based evaluation , 2005, J. Assoc. Inf. Sci. Technol..

[21]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[22]  William S. Maki,et al.  Semantic distance norms computed from an electronic dictionary (WordNet) , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[23]  Daniel Mahler,et al.  Holistic Query Expansion Using Graphical Models , 2004, New Directions in Question Answering.

[24]  R GruberThomas Toward principles for the design of ontologies used for knowledge sharing , 1995 .

[25]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[26]  Susan T. Dumais,et al.  The vocabulary problem in human-system communication , 1987, CACM.

[27]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[28]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[29]  Rada Mihalcea,et al.  PageRank on Semantic Networks, with Application to Word Sense Disambiguation , 2004, COLING.

[30]  Chris Buckley,et al.  Improving automatic query expansion , 1998, SIGIR '98.