Articulating information needs in XML query languages

Document-centric XML is a mixture of text and structure. With the increased availability of document-centric XML documents comes a need for query facilities in which both structural constraints and constraints on the content of the documents can be expressed. How does the expressiveness of languages for querying XML documents help users to express their information needs? We address this question from both an experimental and a theoretical point of view. Our experimental analysis compares a structure-ignorant with a structure-aware retrieval approach using the test suite of the INEX XML Retrieval Evaluation Initiative. Theoretically, we create two mathematical models of users' knowledge of a set of documents and define query languages which exactly fit these models. One of these languages corresponds to an XML version of fielded search, the other to the INEX query language.Our main experimental findings are: First, while structure is used in varying degrees of complexity, two-thirds of the queries can be expressed in a fielded-search-like format which does not use the hierarchical structure of the documents. Second, three-quarters of the queries use constraints on the context of the elements to be returned; these contextual constraints cannot be captured by ordinary keyword queries. Third, structure is used as a search hint, and not as a strict requirement, when judged against the underlying information need. Fourth, the use of structure in queries functions as a precision enhancing device.

[1]  Claire Cardie,et al.  An Analysis of Statistical and Syntactic Phrases , 1997, RIAO.

[2]  KampsJaap,et al.  Articulating information needs in XML query languages , 2006 .

[3]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[4]  Andrew Trotman,et al.  The Simplest Query Language That Could Possibly Work , 2004 .

[5]  Gabriella Kazai,et al.  Advances in XML Information Retrieval and Evaluation: 4th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2005, Dagstuhl ... Papers (Lecture Notes in Computer Science) , 2006 .

[6]  Jaap Kamps,et al.  What Do Users Think of an XML Element Retrieval System? , 2005, INEX.

[7]  Andrew Trotman,et al.  Queries: INEX 2003 working group report , 2004 .

[8]  Gabriel M. Kuper,et al.  Structural properties of XPath fragments , 2003, Theor. Comput. Sci..

[9]  Donna Harman,et al.  The First Text REtrieval Conference (TREC-1) , 1993 .

[10]  M. de Rijke,et al.  Expressiveness of Concept Expressions in First-Order Description Logics , 1999, Artif. Intell..

[11]  Richard Spencer-Smith,et al.  Modal Logic , 2007 .

[12]  Wolfgang May Information Extraction and Integration with Florid: The MONDIAL Case Study , 1999 .

[13]  Donna K. Harman,et al.  Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[14]  Jaap Kamps,et al.  The Effect of Structured Queries and Selective Indexing on XML Retrieval , 2005, INEX.

[15]  M. de Rijke,et al.  Structured queries in XML retrieval , 2005, CIKM '05.

[16]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[17]  M. de Rijke,et al.  Semantic characterizations of navigational XPath , 2005, SGMD.

[18]  Maarten de Rijke,et al.  Processing content-oriented XPath queries , 2004, CIKM '04.

[19]  A. J. Batista-Leyva,et al.  On the interpretation of , 2004 .

[20]  Andrew Trotman,et al.  INEX 2005 guidelines for topic development , 2005 .

[21]  Gad M. Landau,et al.  An Extension of the Vector Space Model for Querying XML Documents via XML Fragments 1 , 2002 .

[22]  Norbert Fuhr,et al.  XIRQL: a query language for information retrieval in XML documents , 2001, SIGIR '01.

[23]  Andrew Trotman,et al.  The Interpretation of CAS , 2005, INEX.

[24]  John Scott What is social network analysis , 2010 .

[25]  Gabriella Kazai,et al.  INEX 2005 Evaluation Measures , 2005, INEX.

[26]  Victor Vianu,et al.  A Web Odyssey: from Codd to XML , 2001, PODS.

[27]  M. de Rijke,et al.  An Element-based Approach to XML Retrieval , 2004 .

[28]  Andrew Trotman,et al.  Narrowed Extended XPath I (NEXI) , 2004, INEX.

[29]  Jacques Savoy,et al.  Term Proximity Scoring for Keyword-Based Retrieval Systems , 2003, ECIR.

[30]  Joel L Fagan,et al.  Experiments in Automatic Phrase Indexing For Document Retrieval: A Comparison of Syntactic and Non-Syntactic Methods , 1987 .

[31]  J.F.A.K. van Benthem,et al.  Modal logic and classical logic , 1983 .

[32]  David Carmel,et al.  Searching XML documents via XML fragments , 2003, SIGIR.

[33]  Birger Larsen,et al.  The Interactive Track at INEX 2004 , 2004, INEX.

[34]  Norbert Fuhr,et al.  XIRQL: An XML query language based on information retrieval concepts , 2004, TOIS.