A Probabilistic Retrieval Model for Semistructured Data

Retrieving semistructured (XML) data typically requires either a structured query such as XPath, or a keyword query that does not take structure into account. In this paper, we infer structural information automatically from keyword queries and incorporate this into a retrieval model. More specifically, we propose the concept of a mapping probability, which maps each query word into a related field (or XML element). This mapping probability is used as a weight to combine the language models estimated from each field. Experiments on two test collections show that our retrieval model based on mapping probabilities outperforms baseline techniques significantly.

[1]  William H. Press,et al.  Numerical recipes in C , 2002 .

[2]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[3]  W. Bruce Croft,et al.  Refining Keyword Queries for XML Retrieval by Combining Content and Structure , 2009, ECIR.

[4]  Sihem Amer-Yahia,et al.  XML search: languages, INEX and scoring , 2006, SGMD.

[5]  Surajit Chaudhuri,et al.  DBXplorer: enabling keyword search over relational databases , 2002, SIGMOD '02.

[6]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[7]  Cyril W. Cleverdon,et al.  The significance of the Cranfield tests on index languages , 1991, SIGIR '91.

[8]  William H. Press,et al.  Numerical Recipes in C, 2nd Edition , 1992 .

[9]  Berthier A. Ribeiro-Neto,et al.  Searching web databases by structuring keyword-based queries , 2002, CIKM '02.

[10]  James P. Callan,et al.  Hierarchical Language Models for XML Component Retrieval , 2004, INEX.

[11]  CHENGXIANG ZHAI,et al.  A study of smoothing methods for language models applied to information retrieval , 2004, TOIS.

[12]  Stephen E. Robertson,et al.  Simple BM25 extension to multiple weighted fields , 2004, CIKM '04.

[13]  Robert Krovetz,et al.  Viewing morphology as an inference process , 1993, Artif. Intell..

[14]  Mounia Lalmas,et al.  Advances in XML Information Retrieval, Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, Germany, December 6-8, 2004, Revised Selected Papers , 2005, INEX.