Multiple sources of evidence for XML retrieval

Document-centric XML collections contain text-rich documents, marked up with XML tags. The tags add lightweight semantics to the text. Querying such collections calls for a hybrid query language: the text-rich nature of the documents suggest a content-oriented (IR) approach, while the mark-up allows users to add structural constraints to their IR queries. We will show how evidence for relevancy from different sources helps to answer such hybrid queries. We evaluate our methods using the INEX 2003 test set, and show that structural hints in hybrid queries help to improve retrieval effectiveness.