论文信息 - Processing content-oriented XPath queries

Processing content-oriented XPath queries

Document-centric XML collections contain text-rich documents, marked up with XML tags that add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggests a content-oriented (IR) approach, while the mark-up allows users to add structural constraints to their IR queries. Hybrid queries tend to be more expressive, which should lead---in principle---to better retrieval performance. In practice, the processing of these hybrid queries within an IR systems turns out to be far from trivial, because a delicate balance between structural and content information needs to be sought. We propose an approach to processing such hybrid content-and-structure queries that decomposes a query into multiple content-only queries whose results are then combined in ways determined by the structural constraints of the original query. We evaluate our methods using the INEX 2003 test-suite, and show (1) that effective ways of processing of content-oriented XPath queries are non-trivial, (2) that there are differences in the effectiveness for different topics types, but (3) that with appropriate processing methods retrieval effectiveness can improve.

Maarten de Rijke | Jaap Kamps | Börkur Sigurbjörnsson

[1] Andrew Trotman,et al. Queries: INEX 2003 working group report , 2004 .

[2] Nicholas J. Belkin,et al. Ask for Information Retrieval: Part I. Background and Theory , 1997, J. Documentation.

[3] Andrew Trotman,et al. The Simplest Query Language That Could Possibly Work , 2004 .

[4] Wesley W. Chu,et al. Configurable indexing and ranking for XML information retrieval , 2004, SIGIR '04.

[5] M. de Rijke,et al. Best-match querying from document-centric XML , 2004, WebDB '04.

[6] Mounia Lalmas,et al. Modelling Vague Content and Structure Querying in XML Retrieval with a Probabilistic Object-Relational Framework , 2004, FQAS.

[7] Djoerd Hiemstra,et al. Using language models for information retrieval , 2001 .

[8] Stephen E. Robertson,et al. Effective site finding using link anchor information , 2001, SIGIR '01.

[9] Chris Buckley,et al. Pivoted Document Length Normalization , 1996, SIGIR Forum.

[10] John D. Lafferty,et al. A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[11] Nicholas J. Belkin,et al. Ask for Information Retrieval: Part II. Results of a Design Study , 1982, J. Documentation.