Semantic search via XML fragments: a high-precision approach to IR

In some IR applications, it is desirable to adopt a high precision search strategy to return a small set of documents that are highly focused and relevant to the user's information need. With these applications in mind, we investigate semantic search using the XML Fragments query language on text corpora automatically pre-processed to encode semantic information useful for retrieval. We identify three XML Fragment operations that can be applied to a query to conceptualize, restrict, or relate terms in the query. We demonstrate how these operations can be used to address four different query-time semantic needs: to specify target information type, to disambiguate keywords, to specify search term context, or to relate select terms in the query. We demonstrate the effectiveness of our semantic search technology through a series of experiments using the two applications in which we embed this technology and show that it yields significant improvement in precision in the search results.

[1]  Yehoshua Sagiv,et al.  XSEarch: A Semantic Search Engine for XML , 2003, VLDB.

[2]  Jörg Tiedemann Integrating Linguistic Knowledge in Passage Retrieval for Question Answering , 2005, HLT/EMNLP.

[3]  Hong Yu,et al.  Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying the Polarity of Opinion Sentences , 2003, EMNLP.

[4]  J. William Murdock,et al.  The Semantic Analysis Workbench (SAW): Towards a Framework for Knowledge Gathering and Synthesis , 2004 .

[5]  Andrei Z. Broder,et al.  Using XML to Query XML - From Theory to Practice , 2004, RIAO.

[6]  Rada Mihalcea,et al.  Semantic Indexing using WordNet Senses , 2000 .

[7]  Richard M. Schwartz,et al.  Nymble: a High-Performance Learning Name-finder , 1997, ANLP.

[8]  Alan F. Smeaton,et al.  Indexing Structures Derived from Syntax in TREC-3: System Description , 1994, TREC.

[9]  Ellen M. Voorhees,et al.  Using WordNet to disambiguate word senses for text retrieval , 1993, SIGIR.

[10]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[11]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[12]  James W. Cooper,et al.  Text analytics for life science using the Unstructured Information Management Architecture , 2004, IBM Syst. J..

[13]  Rada Mihalcea,et al.  Document Indexing using Named Entities , 2001 .

[14]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[15]  Ramanathan V. Guha,et al.  Semantic search , 2003, WWW '03.

[16]  Cheng Niu,et al.  InfoXtract: a customizable intermediate level information extraction engine , 2003, HLT-NAACL 2003.

[17]  Jimmy J. Lin,et al.  Selectively Using Relations to Improve Precision in Question Answering , 2003 .

[18]  Mark Sanderson,et al.  Universities of Leeds, Sheffield and York http://eprints.whiterose.ac.uk/ , 2022 .

[19]  Jeff Heflin,et al.  Searching the Web with SHOE , 2000 .

[20]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[21]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.