Textual evidence gathering and analysis

One useful source of evidence for evaluating a candidate answer to a question is a passage that contains the candidate answer and is relevant to the question. In the DeepQA pipeline, we retrieve passages using a novel technique that we call Supporting Evidence Retrieval, in which we perform separate search queries for each candidate answer, in parallel, and include the candidate answer as part of the query. We then score these passages using an assortment of algorithms that use different aspects and relationships of the terms in the question and passage. We provide evidence that our mechanisms for obtaining and scoring passages have a substantial impact on the ability of our question-answering system to answer questions and judge the confidence of the answers.

[1]  Alessandro Moschitti,et al.  Kernel methods, syntax and semantics for relational text categorization , 2008, CIKM '08.

[2]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[3]  Aditya Kalyanpur,et al.  A framework for merging and ranking of answers in DeepQA , 2012, IBM J. Res. Dev..

[4]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[5]  Neville Ryant,et al.  A large-scale classification of English verbs , 2008, Lang. Resour. Evaluation.

[6]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[7]  Siddharth Patwardhan,et al.  Question analysis: How Watson reads a clue , 2012, IBM J. Res. Dev..

[8]  Aditya Kalyanpur,et al.  Typing candidate answers using type coercion , 2012, IBM J. Res. Dev..

[9]  Douglas B. Lenat,et al.  CYC: a large-scale investment in knowledge infrastructure , 1995, CACM.

[10]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[11]  Beth Levin,et al.  English Verb Classes and Alternations: A Preliminary Investigation , 1993 .

[12]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[13]  Eduard Hovy,et al.  Knowledge-Based Question Answering , 2002 .

[14]  Jimmy J. Lin,et al.  Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[15]  Bernardo Magnini,et al.  Comparing Statistical and Content-Based Techniques for Answer Validation on the Web , 2002 .

[16]  Jennifer Chu-Carroll,et al.  Finding needles in the haystack: Search and candidate generation , 2012, IBM J. Res. Dev..

[17]  Dragomir R. Radev,et al.  The Use of Predictive Annotation for Question Answering in TREC8 , 1999, TREC.

[18]  Wendy Lehnert,et al.  Human and Computational Question Answering , 1977, Cogn. Sci..

[19]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[20]  Michael C. McCord,et al.  Deep parsing in Watson , 2012, IBM J. Res. Dev..

[21]  Kenneth D. Forbus,et al.  Analysis of Strategic Knowledge in Back of the Envelope Reasoning , 2005, AAAI.

[22]  Jun-ichi Fukumoto,et al.  Question Answering System for Non-factoid Type Questions and Automatic Evaluation based on BE Method , 2007, NTCIR.

[23]  Bernardo Magnini,et al.  Is It the Right Answer? Exploiting Web Redundancy for Answer Validation , 2002, ACL.

[24]  W. Bruce Croft,et al.  Answer Passage Retrieval for Question Answering , 2003 .

[25]  Roberto Basili,et al.  Exploiting Syntactic and Shallow Semantic Kernels for Question Answer Classification , 2007, ACL.

[26]  Eric W. Brown,et al.  Making Watson fast , 2012, IBM J. Res. Dev..

[27]  Eric H. Nyberg,et al.  Improving Text Retrieval Precision and Answer Accuracy in Question Answering Systems , 2008, COLING 2008.

[28]  Sanda M. Harabagiu,et al.  COGEX: A Logic Prover for Question Answering , 2003, NAACL.

[29]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[30]  Brian Falkenhainer,et al.  The Structure-Mapping Engine: Algorithm and Examples , 1989, Artif. Intell..

[31]  James P. Callan,et al.  Passage-level evidence in document retrieval , 1994, SIGIR '94.

[32]  J. William Murdock Structure Mapping for Jeopardy! Clues , 2011, ICCBR.

[33]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[34]  S. Morrison,et al.  TWO Heads are Better than One , 2011, The Two-Minute Puzzle Book.

[35]  I. Dan Melamed,et al.  Precision and Recall of Machine Translation , 2003, NAACL.

[36]  Charles L. A. Clarke,et al.  Exploiting redundancy in question answering , 2001, SIGIR '01.

[37]  Alon Lavie,et al.  Meteor, M-BLEU and M-TER: Evaluation Metrics for High-Correlation with Human Rankings of Machine Translation Output , 2008, WMT@ACL.

[38]  Jennifer Chu-Carroll,et al.  In Question Answering, Two Heads Are Better Than One , 2003, NAACL.

[39]  Chang Wang,et al.  Relation extraction and scoring in DeepQA , 2012, IBM J. Res. Dev..