A syntactic tree matching approach to finding similar questions in community-based qa services

While traditional question answering (QA) systems tailored to the TREC QA task work relatively well for simple questions, they do not suffice to answer real world questions. The community-based QA systems offer this service well, as they contain large archives of such questions where manually crafted answers are directly available. However, finding similar questions in the QA archive is not trivial. In this paper, we propose a new retrieval framework based on syntactic tree structure to tackle the similar question matching problem. We build a ground-truth set from Yahoo! Answers, and experimental results show that our method outperforms traditional bag-of-word or tree kernel based methods by 8.3% in mean average precision. It further achieves up to 50% improvement by incorporating semantic features as well as matching of potential answers. Our model does not rely on training, and it is demonstrated to be robust against grammatical errors as well.

[1]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[2]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[3]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[4]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[5]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[6]  Eric Brill,et al.  Automatic Question Answering: Beyond the Factoid , 2004, NAACL.

[7]  Valentin Jijkoun,et al.  Retrieving answers from frequently asked questions pages on the web , 2005, CIKM '05.

[8]  Elizabeth D. Liddy,et al.  Question Answering: CNLP at the TREC 2002 Question Answering Track , 2002, TREC.

[9]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[10]  Eugene Agichtein,et al.  Finding the right facts in the crowd: factoid question answering over social media , 2008, WWW.

[11]  Kristian J. Hammond,et al.  Question Answering from Frequently Asked Question Files: Experiences with the FAQ FINDER System , 1997, AI Mag..

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  Chung-Hsien Wu,et al.  FAQ Mining via List Detection , 2002, COLING 2002.

[14]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[15]  Stephan Bloehdorn,et al.  Structure and semantics for expressive text kernels , 2007, CIKM '07.

[16]  Jianfeng Gao,et al.  Dependence language model for information retrieval , 2004, SIGIR '04.

[17]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[18]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[19]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[20]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.