Text-to-Text Generation for Question Answering

In this chapter, we describe our efforts in text-to-text generation within the IMOGEN project. In particular, we describe two focus areas of research to improve the quality of the answer: (a) graph-based content selection to improve the answer in terms of usefulness, and (b) sentence fusion to improve the answer in terms of formulation. We use sentence fusion to join together multiple sentences in order to eliminate overlapping parts, thereby reducing redundancy. The results of this work have been applied in the IMIX system. This system uses a question answering system to pinpoint fragments of text which are relevant to the information need expressed by the user. A content selection system then uses these fragments as entry points in the text to formulate a more complete answer. Sentence fusion is applied to manipulate the result in order to increase the fluency of the text.

[1]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[2]  Emiel Krahmer,et al.  Explorations in Sentence Fusion , 2005, ENLG.

[3]  Emiel Krahmer,et al.  Annotating a parallel monolingual treebank with semantic similarity relations , 2007 .

[4]  Emiel Krahmer,et al.  Automatic analysis of semantic similarity in comparable text through syntactic tree matching , 2010, COLING.

[5]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[6]  Jimmy J. Lin,et al.  The role of context in question answering systems , 2003, CHI Extended Abstracts.

[7]  Kathleen R. McKeown,et al.  Information fusion for multidocument summarization: paraphrasing and generation , 2003 .

[8]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[9]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[10]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[11]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[12]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[13]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[14]  Ralph Grishman,et al.  Alignment of Shared Forests for Bilingual Corpora , 1996, COLING.

[15]  Daniel Marcu,et al.  Discourse Trees Are Good Indicators of Importance in Text , 1999 .

[16]  Inderjeet Mani,et al.  Multi-Document Summarization by Graph Search and Matching , 1997, AAAI/IAAI.

[17]  Emiel Krahmer,et al.  Detecting semantic overlap : A parallel monolingual treebank for Dutch , 2008 .

[18]  Michael Moortgat,et al.  Syntactic Analysis in the Spoken Dutch Corpus (CGN) , 2002, LREC.

[19]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[20]  Kathleen R. McKeown,et al.  Integrating Rhetorical-Semantic Relation Models for Query-Focused Summarization , 2006 .

[21]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[22]  Russell V. Lenth,et al.  Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[23]  Mark T. Maybury New Directions in Question Answering , 2004 .

[24]  Daniel Marcu,et al.  Generic Sentence Fusion is an Ill-Defined Summarization Task , 2004 .

[25]  Daniel Gildea,et al.  Loosely Tree-Based Alignment for Machine Translation , 2003, ACL.

[26]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[27]  Kevin Knight,et al.  Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[28]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[29]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[30]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[31]  Emiel Krahmer,et al.  Query-based Sentence Fusion is Better Defined and Leads to More Preferred Results than Generic Sentence Fusion , 2008, ACL.

[32]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[33]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[34]  Emiel Krahmer,et al.  Classification of Semantic Relations by Humans and Machines , 2005, EMSEE@ACL.

[35]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[36]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..