Best-match querying from document-centric XML

On the Web, there is a pervasive use of XML to give lightweight semantics to textual collections. Such document-centric XML collections require a query language that can gracefully handle structural constraints as well as constraints on the free text of the documents. Our main contributions are three-fold. First, we outline two fragments of XPath tailored to users that have varying degrees of understanding of the XML structure used, and give both syntactic and semantic characterizations of these fragments. Second, we extend XPath with an about function having a best-match semantics based on the relevance of the document component for the expressed information need. Third, we evaluate the resulting query language using the INEX 2003 test suite, and show that best-match approaches outperform exact-match approaches for evaluating content-and-structure queries.

[1]  Andrew Trotman,et al.  The Simplest Query Language That Could Possibly Work , 2004 .

[2]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[3]  Gabriel M. Kuper,et al.  Structural properties of XPath fragments , 2003, Theor. Comput. Sci..

[4]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part II. Results of a Design Study , 1982, J. Documentation.

[5]  M. de Rijke,et al.  Expressiveness of Concept Expressions in First-Order Description Logics , 1999, Artif. Intell..

[6]  Richard Spencer-Smith,et al.  Modal Logic , 2007 .

[7]  John Scott What is social network analysis , 2010 .

[8]  Ian H. Witten,et al.  Managing gigabytes , 1994 .

[9]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[10]  Torsten. Grust,et al.  Accelerating XPath location steps , 2002, SIGMOD '02.

[11]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[12]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[13]  Wolfgang May Information Extraction and Integration with Florid: The MONDIAL Case Study , 1999 .

[14]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[15]  Andrew Trotman,et al.  Queries: INEX 2003 working group report , 2004 .

[16]  Nicholas J. Belkin,et al.  Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.