论文信息 - Effective XML content and structure retrieval with relevance ranking

Effective XML content and structure retrieval with relevance ranking

XML documents can be retrieved by means of not only content-only (CO) queries, but also content-and-structure (CAS) queries. Though promising better retrieval precision, CAS queries introduce several new challenges. To address these challenges, we propose a novel approach for XML CAS retrieval. The distinctive feature of the approach is that it adopts a content-oriented point of view. Specifically, the approach first decomposes a CAS query into several fragments, then retrieves results for each query fragment in a content-centric way, and finally scores each answer node. The approach is adaptive to versatile homogeneous and heterogeneous data environments. To assess the relevance of retrieval results to a query fragment, we present a scoring strategy that measures relevance from both content and structure perspectives. In addition, an effective approach is proposed to infer answer nodes based on the CAS query and document structure. An efficient algorithm is also presented for CAS retrieval. Finally, we demonstrate the effectiveness of the proposed methods through comprehensive experimental studies.

Lei Chen | Xiping Liu | Changxuan Wan

[1] Maarten de Rijke,et al. XML retrieval: what to retrieve? , 2003, SIGIR '03.

[2] Sihem Amer-Yahia,et al. XML retrieval: db/ir in theory, web in practice , 2007, VLDB.

[3] Sihem Amer-Yahia,et al. Structure and Content Scoring for XML , 2005, VLDB.

[4] Ziyang Liu,et al. Query biased snippet generation in XML search , 2008, SIGMOD Conference.

[5] Yi Chen,et al. Identifying meaningful return information for XML keyword search , 2007, SIGMOD '07.

[6] Tok Wang Ling,et al. Effective XML Keyword Search with Relevance Oriented Ranking , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[7] Rafael Berlanga Llavori,et al. Fragment-based approximate retrieval in highly heterogeneous XML collections , 2008, Data Knowl. Eng..

[8] Wesley W. Chu,et al. Configurable indexing and ranking for XML information retrieval , 2004, SIGIR '04.

[9] Noriko Kando,et al. An empirical study on retrieval models for different document genres: patents and newspaper articles , 2003, SIGIR '03.

[10] Sihem Amer-Yahia,et al. Tree Pattern Relaxation , 2002, EDBT.

[11] Laks V. S. Lakshmanan,et al. FleXPath: flexible structure and full-text querying for XML , 2004, SIGMOD '04.