Enhancing Semantic Search using N-Levels Document Representation

The traditional strategy performed by Information Retrieval (IR) systems is ranked keyword search: For a given query, a list of docu- ments, ordered by relevance, is returned. Relevance computation is pri- marily driven by a basic string-matching operation. To date, several at- tempts have been made to deviate from the traditional keyword search paradigm, often by introducing some techniques to capture word mean- ings in documents and queries. The general feeling is that dealing explic- itly with only semantic information does not improve significantly the performance of text retrieval systems. This paper presents SENSE (SEmantic N-levels Search Engine), an IR system that tries to overcome the limitations of the ranked keyword approach, by introducing semantic levels which integrate (and not sim- ply replace) the lexical level represented by keywords. Semantic levels provide information about word meanings, as described in a reference dictionary, and named entities. We show how SENSE is able to manage documents indexed at three separate levels, keywords, word meanings, and entities, as well as to combine keyword search with semantic infor- mation provided by the two other indexing levels.

[1]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[2]  Pasquale Lops,et al.  Combining Learning and Word Sense Disambiguation for Intelligent User Profiling , 2007, IJCAI.

[3]  Anna Lisa Gentile,et al.  UNIBA: JIGSAW algorithm for Word Sense Disambiguation , 2007, SemEval@ACL.

[4]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[5]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Mohamed Farah,et al.  An outranking approach for rank aggregation in information retrieval , 2007, SIGIR.

[8]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[9]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[10]  Satoshi Sekine,et al.  Extended Named Entity Hierarchy , 2002, LREC.

[11]  Alan F. Smeaton,et al.  TREC-4 Experiments at Dublin City University: Thresholding Posting Lists, Query Expansion with WordNet and POS Tagging of Spanish , 1995, TREC.

[12]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[13]  Rada Mihalcea,et al.  Measuring the Semantic Similarity of Texts , 2005, EMSEE@ACL.

[14]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[15]  Anna Lisa Gentile,et al.  META - MultilanguagE Text Analyzer , 2008 .

[16]  John Davies,et al.  QuizRDF: search technology for the semantic Web , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[17]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[18]  G. Ducatel Hybrid Ontology and Keyword Matching Indexing System , 2006 .

[19]  Rada Mihalcea,et al.  Using WordNet and Lexical Operators to Improve Internet Searches , 2000, IEEE Internet Comput..

[20]  Giovanni Semeraro Personalized Searching by Learning WordNet-based User Profiles , 2007, J. Digit. Inf. Manag..