A quasi-synchronous dependence model for information retrieval

Incorporating syntactic features in a retrieval model has had very limited success in the past, with the exception of binary term dependencies. This paper presents a new term dependency modeling approach based on syntactic dependency parsing for both queries and documents. Our model is inspired by a quasi-synchronous stochastic process for machine translation[21]. We model four different types of relationships between syntactically dependent term pairs to perform inexact matching between documents and queries. We also propose a machine learning technique for predicting optimal parameter settings for a retrieval model incorporating syntactic relationships. The results on TREC collections show that the quasi-synchronous dependence model can improve retrieval performance and outperform a strong state-of-art sequential dependence baseline when we use predicted optimal parameters.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  John D. Lafferty,et al.  Two-stage language models for information retrieval , 2002, SIGIR '02.

[3]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[4]  W. Bruce Croft,et al.  Improving verbose queries using subset distribution , 2010, CIKM.

[5]  David A. Smith,et al.  Quasi-Synchronous Grammars: Alignment by Soft Projection of Syntactic Dependencies , 2006, WMT@HLT-NAACL.

[6]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[7]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[8]  Stuart M. Shieber,et al.  Synchronous Tree-Adjoining Grammars , 1990, COLING.

[9]  Dan Klein,et al.  Accurate Unlexicalized Parsing , 2003, ACL.

[10]  Vitor R. Carvalho,et al.  Reducing long queries using query quality predictors , 2009, SIGIR.

[11]  Rohini K. Srihari,et al.  Biterm language models for document retrieval , 2002, SIGIR '02.

[12]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[13]  W. Bruce Croft,et al.  Indri at TREC 2004: Terabyte Track , 2004, TREC.

[14]  Donald Metzler,et al.  Automatic feature selection in the markov random field model for information retrieval , 2007, CIKM '07.

[15]  Niladri Chatterjee,et al.  Study of divergence for example based English-Hindi machine translation , 2001 .

[16]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[17]  W. Bruce Croft,et al.  Query term ranking based on dependency parsing of verbose queries , 2010, SIGIR '10.

[18]  John D. Lafferty,et al.  Information retrieval as statistical translation , 1999, SIGIR '99.

[19]  Loïc Maisonnasse,et al.  Revisiting the dependence language model for information retrieval , 2007, SIGIR.

[20]  W. Bruce Croft,et al.  The use of phrases and structured queries in information retrieval , 1991, SIGIR '91.

[21]  Van Rijsbergen,et al.  A theoretical basis for the use of co-occurence data in information retrieval , 1977 .

[22]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[23]  Pu-Jen Cheng,et al.  A term dependency-based approach for query terms ranking , 2009, CIKM.

[24]  Gary Geunbae Lee,et al.  Dependency Structure Applied to Language Modeling for Information Retrieval , 2006 .

[25]  Young-In Song,et al.  A novel retrieval approach reflecting variability of syntactic phrase representation , 2007, Journal of Intelligent Information Systems.

[26]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[27]  W. Bruce Croft,et al.  A Markov random field model for term dependencies , 2005, SIGIR '05.