Capturing the Semantics of Key Phrases Using Multiple Languages for Question Retrieval

In the age of Web 2.0, community user contributed questions and answers provide an important alternative for knowledge acquisition through web search. Question retrieval in current community-based question answering (CQA) services do not, in general, work well for long and complex queries, such as the questions. The main reasons are the verboseness in natural language queries and the word mismatch between the queries and the candidate questions in the CQA archive during retrieval. To address these two problems, existing solutions try to refine the search queries by distinguishing the key concepts in the queries and expanding the queries with relevant content. However, using the existing query refinement approaches can only identify the key and non-key concepts, while the differences between the key concepts are overlooked. Moreover, the existing query expansion approaches, not only overlook the weights of key concepts in the queries, but also fail to consider concept level expansion for them. In this paper, we explore a key concept identification approach for query refinement and a pivot language translation based approach to explore key concept paraphrasing. We further propose a new question retrieval model which can seamlessly integrate the key concepts and their paraphrases. The experimental results demonstrate that the integrated retrieval model significantly outperforms the state-of-the-art models in question retrieval.

[1]  Christian S. Jensen,et al.  Approaches to Exploring Category Information for Question Retrieval in Community Question-Answer Archives , 2012, TOIS.

[2]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[3]  Fang Liu,et al.  Improving Question Retrieval in Community Question Answering Using World Knowledge , 2013, IJCAI.

[4]  Jimmy J. Lin,et al.  Extracting Structural Paraphrases from Aligned Monolingual Corpora , 2003, IWP@ACL.

[5]  Paolo Rosso,et al.  A WordNet-based Query Expansion Method for Geographical Information Retrieval , 2005, CLEF.

[6]  Amit Singh Entity based Q&A Retrieval , 2012, EMNLP-CoNLL.

[7]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[8]  Yong Yu,et al.  Searching Questions by Identifying Question Topic and Question Focus , 2008, ACL.

[9]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[10]  Chris Quirk,et al.  Monolingual Machine Translation for Paraphrase Generation , 2004, EMNLP.

[11]  W. Bruce Croft,et al.  Learning concept importance using a weighted dependence model , 2010, WSDM '10.

[12]  Noah A. Smith,et al.  The Web as a Parallel Corpus , 2003, CL.

[13]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[14]  M. de Rijke,et al.  Learning Semantic Query Suggestions , 2009, SEMWEB.

[15]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[16]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[17]  Marti A. Hearst Chapter 2 of the second edition of Modern Information Retrieval Renamed Modern Information Retrieval : The Concepts and Technology behind Search , 2011 .

[18]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[19]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[20]  Po Hu,et al.  Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering , 2015, ACL.

[21]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[22]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[23]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[24]  Chris Callison-Burch,et al.  Improved Statistical Machine Translation Using Monolingually-Derived Paraphrases , 2009, EMNLP.

[25]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[26]  Maarten de Rijke,et al.  Feeding the Second Screen: Semantic Linking based on Subtitles , 2013, DIR.

[27]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[28]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[29]  W. Bruce Croft,et al.  Modeling higher-order term dependencies in information retrieval using query hypergraphs , 2012, SIGIR '12.

[30]  Chris Quirk,et al.  Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[31]  Hwee Tou Ng,et al.  Enriching document representation via translation for improved monolingual information retrieval , 2011, SIGIR.

[32]  Noriko Tomuro,et al.  Interrogative Reformulation Patterns and Acquisition of Question Paraphrases , 2003, IWP@ACL.

[33]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[34]  Chris Callison-Burch,et al.  Syntactic Constraints on Paraphrases Extracted from Parallel Corpora , 2008, EMNLP.

[35]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[36]  Iryna Gurevych,et al.  Combining Lexical Semantic Resources with Question & Answer Archives for Translation-Based Answer Finding , 2009, ACL.

[37]  W. Bruce Croft,et al.  Query term ranking based on dependency parsing of verbose queries , 2010, SIGIR '10.

[38]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[39]  W. Bruce Croft,et al.  Discovering key concepts in verbose queries , 2008, SIGIR '08.

[40]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[41]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[42]  James Allan,et al.  INQUERY at TREC-5 , 1996, TREC.

[43]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[44]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[45]  Jianfeng Gao,et al.  Towards Concept-Based Translation Models Using Search Logs for Query Expansion , 2012, Proceedings of the 21st ACM international conference on Information and knowledge management.

[46]  Emanuele Pianta,et al.  Beyond Lexical Units: Enriching WordNets with Phrasets , 2003, EACL.

[47]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[48]  Haifeng Wang,et al.  Pivot Approach for Extracting Paraphrase Patterns from Bilingual Corpora , 2008, ACL.

[49]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[50]  Kevyn Collins-Thompson,et al.  Query expansion using random walk models , 2005, CIKM '05.

[51]  W. Bruce Croft,et al.  TREC and Tipster Experiments with Inquery , 1995, Inf. Process. Manag..