A survey on question answering technology from an information retrieval perspective

This article provides a comprehensive and comparative overview of question answering technology. It presents the question answering task from an information retrieval perspective and emphasises the importance of retrieval models, i.e., representations of queries and information documents, and retrieval functions which are used for estimating the relevance between a query and an answer candidate. The survey suggests a general question answering architecture that steadily increases the complexity of the representation level of questions and information objects. On the one hand, natural language queries are reduced to keyword-based searches, on the other hand, knowledge bases are queried with structured or logical queries obtained from the natural language questions, and answers are obtained through reasoning. We discuss different levels of processing yielding bag-of-words-based and more complex representations integrating part-of-speech tags, classification of the expected answer type, semantic roles, discourse analysis, translation into a SQL-like language and logical representations.

[1]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[2]  Marie-Francine Moens,et al.  Argumentation mining: the detection, classification and structure of arguments in text , 2009, ICAIL.

[3]  Terry Winograd,et al.  Procedures As A Representation For Data In A Computer Program For Understanding Natural Language , 1971 .

[4]  Mary Czerwinski,et al.  Voicepedia: towards speech-based access to unstructured information , 2007, INTERSPEECH.

[5]  Weiguo Fan,et al.  Beyond keywords: Automated question answering on the web , 2008, CACM.

[6]  Sanda M. Harabagiu,et al.  Temporal Context Representation and Reasoning , 2005, IJCAI.

[7]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[8]  Eduard H. Hovy,et al.  Toward Semantics-Based Answer Pinpointing , 2001, HLT.

[9]  Eduard H. Hovy,et al.  The Automated Acquisition of Topic Signatures for Text Summarization , 2000, COLING.

[10]  E. Shortliffe Computer-based medical consultations: mycin (elsevier north holland , 1976 .

[11]  Tilman Becker,et al.  Question Answering by Searching Large Corpora With Linguistic Methods , 2004, TREC.

[12]  Bonnie Webber,et al.  The Handbook of Computational Linguistics and Natural Language Processing , 2010 .

[13]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[14]  Dan Roth,et al.  Knowledge Representation for Semantic Entailment and Question-Answering , 1995 .

[15]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[16]  Noah A. Smith,et al.  Good Question! Statistical Ranking for Question Generation , 2010, NAACL.

[17]  Avinash J. Agrawal Using Domain Specific Question Answering Technique for Automatic Railways Inquiry on Mobile Phone , 2008, Fifth International Conference on Information Technology: New Generations (itng 2008).

[18]  John Yearwood,et al.  The Impact of Semantic Class Identification and Semantic Role Labeling on Natural Language Answer Extraction , 2008, ECIR.

[19]  Marie-Francine Moens,et al.  Information Extraction: Algorithms and Prospects in a Retrieval Context , 2006, The Information Retrieval Series.

[20]  Boris Katz,et al.  Syntactic and Semantic Decomposition Strategies for Question Answering from Multiple Resources * , 2005 .

[21]  Jimmy J. Lin,et al.  Viewing the Web as a Virtual Database for Question Answering , 2004, New Directions in Question Answering.

[22]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[23]  Anselmo Peñas,et al.  Overview of ResPubliQA 2009: Question Answering Evaluation over European Legislation , 2009, CLEF.

[24]  Alessandro Moschitti,et al.  Syntactic and Semantic Kernels for Short Text Pair Categorization , 2009, EACL.

[25]  Fredric C. Gey,et al.  GeoCLEF 2008: the CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview , 2008, CLEF.

[26]  Enrique Alfonseca,et al.  A ProtoType Question Answering System Using Syntactic and Semantic Information for Answer Retrieval , 2001, TREC.

[27]  Marie-Francine Moens,et al.  Cross-Media Alignment of Names and Faces , 2010, IEEE Transactions on Multimedia.

[28]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[29]  Xu Liu,et al.  Mobile Retriever: access to digital documents from their physical source , 2008, International Journal of Document Analysis and Recognition (IJDAR).

[30]  Huong Thanh Le,et al.  Natural Language Interface Construction Using Semantic Grammars , 2008, PRICAI.

[31]  James E. McDonald,et al.  Habitability In Question-Answering Systems , 2008 .

[32]  Jimmy J. Lin,et al.  The START Multimedia Information System: Current Technology and Future Directions , 2002, Multimedia Information Systems.

[33]  Ramez Elmasri,et al.  Fundamentals of Database Systems , 1989 .

[34]  Lou Boves,et al.  Using Syntactic Information for Improving Why-Question Answering , 2008, COLING.

[35]  Sanda M. Harabagiu,et al.  Question answering based on temporal inference , 2005, AAAI 2005.

[36]  Wei Li,et al.  A Question Answering System Supported by Information Extraction , 2000, ANLP.

[37]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[38]  Hsin-Hsi Chen,et al.  Overview of the NTCIR-5 Cross-Lingual Question Answering Task (CLQA1) , 2005, NTCIR.

[39]  Diego Molla Aliod,et al.  Learning of Graph-based Question Answering Rules , 2006 .

[40]  LiXin,et al.  Learning question classifiers: the role of semantic information , 2006 .

[41]  Sanda M. Harabagiu,et al.  Question Answering Based on Semantic Structures , 2004, COLING.

[42]  Ellen M. Voorhees,et al.  TREC: Continuing information retrieval's tradition of experimentation , 2007, CACM.

[43]  Razvan C. Bunescu,et al.  Subsequence Kernels for Relation Extraction , 2005, NIPS.

[44]  Craig W. Thompson,et al.  Extending a Natural Language Interface with Geospatial Queries , 2007, IEEE Internet Computing.

[45]  Sanda M. Harabagiu,et al.  Performance issues and error analysis in an open-domain question answering system , 2003, TOIS.

[46]  Iustin Dornescu Semantic QA for Encyclopaedic Questions: EQUAL in GikiCLEF , 2009, CLEF.

[47]  Sanda M. Harabagiu,et al.  COGEX: A Logic Prover for Question Answering , 2003, NAACL.

[48]  Martin M. Soubbotin Patterns of Potential Answer Expressions as Clues to the Right Answers , 2001, TREC.

[49]  Joakim Nivre,et al.  Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics , 2009 .

[50]  Andrew Trotman,et al.  Focused Access to XML Documents: 6th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2007, Schloss Dagstuhl, Germany , 2008 .

[51]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[52]  Valentin Jijkoun,et al.  The University of Amsterdam at WiQA 2006 , 2006, CLEF.

[53]  Eduard Hovy,et al.  A question/answer typology with surface text patterns , 2002 .

[54]  Fabio Rinaldi,et al.  Exploiting Paraphrases in a Question Answering System , 2003, IWP@ACL.

[55]  Ellen M. Voorhees,et al.  TREC: Experiment and Evaluation in Information Retrieval (Digital Libraries and Electronic Publishing) , 2005 .

[56]  Valentin Jijkoun,et al.  Overview of the CLEF 2006 Multilingual Question Answering Track , 2006, CLEF.

[57]  Eduard Hovy,et al.  Knowledge-Based Question Answering , 2002 .

[58]  Ingrid Zukerman,et al.  Query expansion and query reduction in document retrieval , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[59]  Dan I. Moldovan,et al.  Question Answering with Lexical Chains Propagating Verb Arguments , 2006, ACL.

[60]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[61]  Sergei Nirenburg,et al.  A Situated Ontology for Practical NLP , 1995 .

[62]  Berthold Crysmann,et al.  Question answering from structured knowledge sources , 2007, J. Appl. Log..

[63]  Sanda M. Harabagiu,et al.  High performance question/answering , 2001, SIGIR '01.

[64]  Srinivas Bangalore,et al.  Qme! : A Speech-based Question-Answering system on Mobile Devices , 2010, HLT-NAACL.

[65]  José Luis Vicedo González,et al.  Addressing ontology-based question answering with collections of user queries , 2009, Inf. Process. Manag..

[66]  Rafael Muñoz,et al.  Enhancing QA Systems with Complex Temporal Question Processing Capabilities , 2009, J. Artif. Intell. Res..

[67]  Le An Ha,et al.  A computer-aided environment for generating multiple-choice test items , 2006, Natural Language Engineering.

[68]  W. Dosch,et al.  Proceedings of the Fourth International Conference on Information Technology: New Generations (ITNG 2007). , 2008, ITNG 2008.

[69]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[70]  Joakim Nivre,et al.  MaltParser: A Data-Driven Parser-Generator for Dependency Parsing , 2006, LREC.

[71]  Dan Roth,et al.  An Inference Model for Semantic Entailment in Natural Language , 2005, IJCAI.

[72]  Pierre Zweigenbaum Question answering in biomedicine , 2003 .

[73]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[74]  Iadh Ounis,et al.  Proceedings of the IR research, 30th European conference on Advances in information retrieval , 2008 .

[75]  James Pustejovsky,et al.  TimeML: Robust Specification of Event and Temporal Expressions in Text , 2003, New Directions in Question Answering.

[76]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[77]  James Allan,et al.  Question Answering Using Integrated Information Retrieval and Information Extraction , 2007, NAACL.

[78]  Mirella Lapata,et al.  Using Semantic Roles to Improve Question Answering , 2007, EMNLP.

[79]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[80]  Diana Santos,et al.  GikiCLEF: Crosscultural Issues in an International Setting: Asking non-English-centered Questions to Wikipedia , 2009, CLEF.

[81]  Daniel Marcu,et al.  The rhetorical parsing of unrestricted texts: a surface-based approach , 2000, CL.

[82]  Enrico Motta,et al.  AquaLog: An Ontology-Portable Question Answering System for the Semantic Web , 2005, ESWC.

[83]  Alexander Yates Extracting World Knowledge from the Web , 2009, Computer.

[84]  Ben Taskar,et al.  Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning) , 2007 .

[85]  Jimmy J. Lin,et al.  Data-Intensive Question Answering , 2001, TREC.

[86]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[87]  Antonio Toral,et al.  Exploiting Wikipedia and EuroWordNet to solve Cross-Lingual Question Answering , 2009, Inf. Sci..

[88]  James P. Callan,et al.  Structured retrieval for question answering , 2007, SIGIR.

[89]  Noah A. Smith,et al.  Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions , 2010, NAACL.

[90]  Karen Spärck Jones,et al.  Natural language interfaces to databases , 1990, The Knowledge Engineering Review.

[91]  Sanda M. Harabagiu,et al.  FALCON: Boosting Knowledge for Answer Engines , 2000, TREC.

[92]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[93]  Roberto Basili,et al.  Ontological resources and question answering , 2004, HLT-NAACL 2004.

[94]  W. Bruce Croft,et al.  Inference networks for document retrieval , 1989, SIGIR '90.

[95]  Özgür Ulusoy,et al.  A Natural Language-Based Interface for Querying a Video Database , 2007, IEEE MultiMedia.

[96]  Xiaoyan Li,et al.  Syntactic features in question answering , 2003, SIGIR.

[97]  Shafiq R. Joty,et al.  Improving the Performance of the Random Walk Model for Answering Complex Questions , 2008, ACL.

[98]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[99]  M. Fitting First-order logic and automated theorem proving (2nd ed.) , 1996 .

[100]  Constantin Orasan,et al.  Automatic Question Pattern Generation for Ontology-based Question Answering , 2008, FLAIRS.

[101]  Patrick Saint-Dizier,et al.  Advanced Relaxation for Cooperative Question Answering , 2004, New Directions in Question Answering.

[102]  Suresh Manandhar,et al.  Designing an interactive open-domain question answering system , 2009, Natural Language Engineering.

[103]  Nitin Indurkhya,et al.  Handbook of Natural Language Processing , 2010 .

[104]  Sergei Nirenburg Proceedings of the sixth conference on Applied natural language processing , 2000 .

[105]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[106]  Luke S. Zettlemoyer,et al.  Learning Context-Dependent Mappings from Sentences to Logical Form , 2009, ACL.

[107]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[108]  Kathleen R. McKeown,et al.  Columbia multi-document summarization : Approach and evaluation , 2001 .

[109]  Farida Aouladomar,et al.  Some Foundational Linguistic Elements for QA Systems: an Application to E-government Services , 2005, JURIX.

[110]  Mark T. Maybury New Directions in Question Answering , 2004 .

[111]  Luc De Raedt,et al.  Probabilistic inductive logic programming , 2004 .

[112]  Edward James Schofield,et al.  A Speech Interface for Open-Domain Question-Answering , 2003, ACL.

[113]  Zhiping Zheng,et al.  AnswerBus question answering system , 2002 .

[114]  Jaime G. Carbonell,et al.  Unsupervised question answering data acquisition from local corpora , 2004, CIKM '04.

[115]  Jimmy J. Lin,et al.  Overview of the TREC 2007 Question Answering Track , 2008, TREC.

[116]  Edward H. Shortliffe,et al.  Computer-based medical consultations, MYCIN , 1976 .

[117]  Noriko Kando,et al.  Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access , 2008, NTCIR.

[118]  Thomas S. Morton,et al.  Using Coreference for Question Answering , 1999, TREC.

[119]  Oren Etzioni,et al.  Towards a theory of natural language interfaces to databases , 2003, IUI '03.

[120]  James Pustejovsky,et al.  Representing Temporal and Event Knowledge for QA Systems , 2004, New Directions in Question Answering.

[121]  Bert F. Green,et al.  Baseball: an automatic question-answerer , 1899, IRE-AIEE-ACM '61 (Western).

[122]  Harris Wu,et al.  Probabilistic question answering on the web , 2002, WWW '02.

[123]  Sanda M. Harabagiu,et al.  Methods for Using Textual Entailment in Open-Domain Question Answering , 2006, ACL.

[124]  Ingrid Zukerman,et al.  Lexical Query Paraphrasing for Document Retrieval , 2002, COLING.

[125]  Kalina Bontcheva,et al.  A Natural Language Query Interface to Structured Information , 2008, ESWC.

[126]  Marie-Francine Moens,et al.  From language towards formal spatial calculi , 2010 .

[127]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[128]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[129]  Ramez Elmasri,et al.  Fundamentals of Database Systems, 5th Edition , 2006 .

[130]  Alessandro Moschitti,et al.  Linguistic kernels for answer re-ranking in question answering systems , 2011, Inf. Process. Manag..

[131]  R. U S L A N M I T K O V,et al.  A computer-aided environment for generating multiple-choice test items , 2005 .

[132]  Francis Jeffry Pelletier,et al.  Representation and Inference for Natural Language: A First Course in Computational Semantics , 2005, Computational Linguistics.

[133]  Jimmy J. Lin,et al.  Answering Clinical Questions with Knowledge-Based and Statistical Techniques , 2007, CL.

[134]  Maria Wolters,et al.  Prosody and the Resolution of Pronominal Anaphora , 2000, International Conference on Computational Linguistics.

[135]  Bonnie L. Webber,et al.  Special issue on interactive question answering: Introduction , 2009, Natural Language Engineering.

[136]  Dragomir R. Radev,et al.  Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing , 2006 .

[137]  Peter Thanisch,et al.  Natural language interfaces to databases – an introduction , 1995, Natural Language Engineering.

[138]  Steven Schockaert,et al.  Supporting temporal question answering: strategies for offline data collection , 2006 .

[139]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[140]  Robert F. Simmons,et al.  Answering English questions by computer: a survey , 1965, CACM.

[141]  Fabio Rinaldi,et al.  Knowledge-based Question Answering , 2003 .

[142]  Diego Mollá Aliod,et al.  Question Answering in Restricted Domains: An Overview , 2007, CL.

[143]  Ray R. Larson Interactive Probabilistic Search for GikiCLEF , 2009, CLEF.

[144]  Raymond Reiter,et al.  Deductive Question-Answering on Relational Data Bases , 1977, Logic and Data Bases.

[145]  Barry G. T. Lowden,et al.  The REMIT System for Paraphrasing Relational Query Expressions into Natural Language , 1986, VLDB.

[146]  Sanda M. Harabagiu,et al.  Performance Issues and Error Analysis in an Open-Domain Question Answering System , 2002, ACL.

[147]  Rafael Muñoz,et al.  Evaluation of Complex Temporal Questions in CLEF-QA , 2004, CLEF.

[148]  Marvin Minsky,et al.  Semantic Information Processing , 1968 .

[149]  Maarten de Rijke,et al.  Overview of the CLEF 2004 Multilingual Question Answering Track , 2004, CLEF.

[150]  Udo Kruschwitz,et al.  Identifying Novel Information using Latent Semantic Analysis in the WiQA Task at CLEF 2006 , 2006, CLEF.

[151]  Melvin Fitting,et al.  First-Order Logic and Automated Theorem Proving , 1990, Graduate Texts in Computer Science.

[152]  Fredric C. Gey,et al.  GeoCLEF: the CLEF 2005 Cross-Language Geographic Information Retrieval Track , 2005, CLEF.

[153]  Elizabeth D. Liddy,et al.  What do You Mean? Finding Answers to Complex Questions , 2003, New Directions in Question Answering.

[154]  Carol Peters,et al.  Multilingual Information Access Evaluation I. Text Retrieval Experiments, 10th Workshop of the Cross-Language Evaluation Forum, CLEF 2009, Corfu, Greece, September 30 - October 2, 2009, Revised Selected Papers , 2010, CLEF.

[155]  Sadaoki Furui,et al.  Optimizing Question Answering Accuracy by Maximizing Log-Likelihood , 2010, ACL.

[156]  Mitchell P. Marcus Proceedings of the second international conference on Human Language Technology Research , 2002 .

[157]  Andrew Hickl,et al.  Question Answering with LCC's CHAUCER-2 at TREC 2007 , 2006, TREC.

[158]  Gosse Bouma,et al.  Developing Offline Strategies for Answering Medical Questions , 2005 .

[159]  KucuktuncOnur,et al.  A Natural Language-Based Interface for Querying a Video Database , 2007 .

[160]  Ingo Glöckner,et al.  Extending a Logic-Based Question Answering System for Administrative Texts , 2009, CLEF.

[161]  Valentin Jijkoun,et al.  Overview of WiQA 2006 , 2006 .

[162]  Fredric C. Gey,et al.  ENSM-SE at CLEF 2006 : Fuzzy Proximity Method with an Adhoc Influence Function in Evaluation of Multilingual and Multi-modal Information Retrieval 7th Workshop of the Cross-Language Evaluation Forum, CLEF 2006, Alicante, Spain , 2007 .

[163]  Nihan Kesim Cicekli,et al.  Natural language querying for video databases , 2008, Inf. Sci..

[164]  Jun Suzuki,et al.  Question Classification using HDAG Kernel , 2003, ACL 2003.

[165]  Andrew McCallum,et al.  FACTORIE: Probabilistic Programming via Imperatively Defined Factor Graphs , 2009, NIPS.

[166]  Christopher C. Yang Search Engines Information Retrieval in Practice , 2010, J. Assoc. Inf. Sci. Technol..

[167]  Arantxa Otegi,et al.  Using Semantic Relatedness and Word Sense Disambiguation for (CL)IR , 2009, CLEF.

[168]  Tomek Strzalkowski,et al.  HITIQA: High-quality intelligence through interactive question answering , 2009, Nat. Lang. Eng..

[169]  Sadaoki Furui,et al.  Factoid Question Answering with Web, Mobile and Speech Interfaces , 2006, NAACL.

[170]  Zuhair Bandar,et al.  Conversation-Based Natural Language Interface to Relational Databases , 2007, 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Workshops.

[171]  Farah Benamara Cooperative Question Answering in Restricted Domains: the WEBCOOP Experiment , 2004 .