Statistical Machine Translation for Query Expansion in Answer Retrieval

We present an approach to query expansion in answer retrieval that uses Statistical Machine Translation (SMT) techniques to bridge the lexical gap between questions and answers. SMT-based query expansion is done by i) using a full-sentence paraphraser to introduce synonyms in context of the entire query, and ii) by translating query terms into answer terms using a full-sentence SMT model trained on question-answer pairs. We evaluate these global, context-aware query expansion techniques on tfidf retrieval from 10 million question-answer pairs extracted from FAQ pages. Experimental results show that SMTbased expansion improves retrieval performance over local expansion and over retrieval without expansion.

[1]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[2]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[3]  Dragomir R. Radev,et al.  Mining the web for answers to natural language questions , 2001, CIKM '01.

[4]  Eric Brill,et al.  Automatic question answering using the web: Beyond the Factoid , 2006, Information Retrieval.

[5]  S. Harabagiu,et al.  Strategies for Advanced Question Answering , 2004, Workshop On Pragmatics Of Question Answering.

[6]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[7]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[8]  Luis Gravano,et al.  Learning to find answers to questions on the Web , 2004, TOIT.

[9]  Sanda M. Harabagiu,et al.  The Role of Lexico-Semantic Feedback in Open-Domain Textual Question-Answering , 2001, ACL.

[10]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[11]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[12]  Jennifer Chu-Carroll,et al.  Use of WordNet Hypernyms for Answering What-Is Questions , 2001, TREC.

[13]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[14]  Ellen M. Voorhees,et al.  Query expansion using lexical-semantic relations , 1994, SIGIR '94.

[15]  Daniel Marcu,et al.  Natural Language Based Reformulation Resource and Wide Exploitation for Question Answering , 2002, TREC.

[16]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[17]  Valentin Jijkoun,et al.  Retrieving answers from frequently asked questions pages on the web , 2005, CIKM '05.

[18]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[19]  Kristian J. Hammond,et al.  Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System , 1997, AI Mag..

[20]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[21]  Eduard H. Hovy,et al.  Question Answering in Webclopedia , 2000, TREC.

[22]  Daniel Marcu,et al.  A Noisy-Channel Approach to Question Answering , 2003, ACL.

[23]  Adwait Ratnaparkhi,et al.  IBM's Statistical Question Answering System , 2000, TREC.

[24]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[25]  Jennifer Chu-Carroll,et al.  Answering the question you wish they had asked: The impact of paraphrasing for Question Answering , 2006, NAACL.