Probabilistic models for answer-ranking in multilingual question-answering

This article presents two probabilistic models for answering ranking in the multilingual question-answering (QA) task, which finds exact answers to a natural language question written in different languages. Although some probabilistic methods have been utilized in traditional monolingual answer-ranking, limited prior research has been conducted for answer-ranking in multilingual question-answering with formal methods. This article first describes a probabilistic model that predicts the probabilities of correctness for individual answers in an independent way. It then proposes a novel probabilistic method to jointly predict the correctness of answers by considering both the correctness of individual answers as well as their correlations. As far as we know, this is the first probabilistic framework that proposes to model the correctness and correlation of answer candidates in multilingual question-answering and provide a novel approach to design a flexible and extensible system architecture for answer selection in multilingual QA. An extensive set of experiments were conducted to show the effectiveness of the proposed probabilistic methods in English-to-Chinese and English-to-Japanese cross-lingual QA, as well as English, Chinese, and Japanese monolingual QA using TREC and NTCIR questions.

[1]  Gilad Mishne,et al.  The University of Amsterdam at the TREC 2003 Question Answering Track , 2003, TREC.

[2]  Paolo Rosso,et al.  Mining Knowledge fromWikipedia for the Question Answering task , 2006, LREC.

[3]  Malvina Nissim,et al.  Cross-Lingual Question Answering by Answer Translation , 2006, CLEF.

[4]  T. Minka A comparison of numerical optimizers for logistic regression , 2004 .

[5]  Teruko Mitamura,et al.  CMU JAVELIN System for NTCIR5 CLQA1 , 2005, NTCIR.

[6]  Hsin-Hsi Chen,et al.  Construction of a Chinese-English WordNet and its application to CLIR , 2000, IRAL '00.

[7]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[8]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[9]  Gilad Mishne,et al.  Using Wikipedia at the TREC QA Track , 2004, TREC.

[10]  Jennifer Chu-Carroll,et al.  In Question Answering, Two Heads Are Better Than One , 2003, NAACL.

[11]  Sanda M. Harabagiu,et al.  Answer Mining by Combining Extraction Techniques with Abductive Reasoning , 2003, Text Retrieval Conference.

[12]  Andrew Hickl,et al.  Question Answering with LCC's CHAUCER-2 at TREC 2007 , 2006, TREC.

[13]  Lucian Vlad Lita,et al.  JAVELIN I and II Systems at TREC 2005 , 2005, TREC.

[14]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[15]  M. de Rijke,et al.  The University of Amsterdam at TREC 2012 , 2012, TREC.

[16]  Jimmy J. Lin,et al.  Integrating Web-based and Corpus-based Techniques for Question Answering , 2003, TREC.

[17]  Bernardo Magnini,et al.  Comparing Statistical and Content-Based Techniques for Answer Validation on the Web , 2002 .

[18]  Luo Si Federated search of text search engines in uncooperative environments , 2007, SIGF.

[19]  Dragomir R. Radev,et al.  Question-answering by predictive annotation , 2000, SIGIR '00.

[20]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[21]  Sanda M. Harabagiu,et al.  COGEX: A Logic Prover for Question Answering , 2003, NAACL.

[22]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[23]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[24]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .

[25]  Ian Soboroff,et al.  Overview of the TREC 2004 Novelty Track , 2004, TREC.

[26]  Luo Si,et al.  A probabilistic graphical model for joint answer ranking in question answering , 2007, SIGIR.

[27]  Daniel Marcu,et al.  How To Select An Answer String , 2008 .

[28]  Teruko Mitamura,et al.  Language-independent Probabilistic Answer Ranking for Question Answering , 2007, ACL.

[29]  Valentin Jijkoun,et al.  Quartz: A Question Answering System for Dutch , 2006, CLEF.

[30]  Jaime G. Carbonell,et al.  The JAVELIN Question-Answering System at TREC 2002 , 2002, TREC.

[31]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[32]  Eduard Hovy,et al.  Statistical QA - Classifier vs. Re-ranker: What’s the difference? , 2003, ACL 2003.

[33]  Oren Etzioni,et al.  Scaling question answering to the Web , 2001, WWW '01.

[34]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[35]  Teruko Mitamura,et al.  Extending the JAVELIN QA System with Domain Semantics ∗ , 2005 .

[36]  Jaime G. Carbonell,et al.  Federated Ontology Search , 2006 .

[37]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[38]  Sanda M. Harabagiu,et al.  FALCON: Boosting Knowledge for Answer Engines , 2000, TREC.

[39]  Scott Miller,et al.  TREC 2002 QA at BBN: Answer Selection and Confidence Estimation , 2002, TREC.

[40]  Jennifer Chu-Carroll,et al.  IBM's PIQUANT II in TREC 2004 , 2004, TREC.

[41]  Teruko Mitamura,et al.  JAVELIN III: Cross-Lingual Question Answering from Japanese and Chinese Documents , 2007, NTCIR.

[42]  Grace Hui Yang,et al.  QUALIFIER In TREC-12 QA Main Task , 2003, TREC.

[43]  Jaime G. Carbonell,et al.  The JAVELIN Question-Answering System at TREC 2003: A Multi-Strategh Approach with Dynamic Planning , 2003, TREC.

[44]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[45]  Luo Si,et al.  Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering , 2010, Inf. Process. Manag..

[46]  Malvina Nissim,et al.  Answer Translation: An Alternative Approach to Cross-Lingual Question Answering , 2006, CLEF.

[47]  Claire Cardie,et al.  Examining the Role of Statistical and Linguistic Knowledge Sources in a General-Knowledge Question-Answering System , 2000, ANLP.

[48]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[49]  M. de Rijke,et al.  Type Checking in Open-Domain Question Answering , 2004, ECAI.

[50]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[51]  Jennifer Chu-Carroll,et al.  IBM's PIQUANT in TREC2003 , 2003, TREC.

[52]  Teruko Mitamura,et al.  A Fast, Accurate Deterministic Parser for Chinese , 2006, ACL.

[53]  Lucian Vlad Lita,et al.  Resource Analysis for Question Answering , 2004, ACL.

[54]  Valentin Jijkoun,et al.  The University of Amsterdam at CLEF@QA 2007 , 2006, CLEF.