Building an Annotated Corpus for Text Summarization and Question Answering

We describe ongoing work in semi-automatic annotating corpus, with the goal to answer why-question in question answering system and give a construction of the coherent tree for text summarization. In this paper we present annotation schemas for identifying the discourse relations that hold between the parts of text as well as the particular textual of span that are related via the discourse relation. Furthermore, we address several tasks in building the annotated corpus in discourse level, namely creating annotated guidelines, ensuring annotation accuracy and evaluating.

[1]  Du-Seong Chang,et al.  Causal Relation Extraction Using Cue Phrase and Lexical Pair Probabilities , 2004, IJCNLP.

[2]  Nattakan Pengphon,et al.  Word Formation Approach to Noun Phrase Analysis for Thai , 2002 .

[3]  Jean Caelen,et al.  Thai Text Coherence Structuring with Coordinating and Subordinating Relations for Text Summarization , 2007, CONTEXT.

[4]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[5]  Dan I. Moldovan,et al.  Mining Answers for Causation Questions , 2002 .

[6]  Asanee Kawtrakul,et al.  Thai Named Entity Extraction by incorporating Maximum Entropy Model with Simple Heuristic Information , 2004 .

[7]  Takashi Inui,et al.  Acquiring Causal Knowledge from Text Using Connective Markers , 2004 .

[8]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[9]  Yuji Matsumoto,et al.  Acquiring causal knowledge from text using the connective marker tame , 2005, TALIP.

[10]  Daniel Marcu,et al.  An Unsupervised Approach to Recognizing Discourse Relations , 2002, ACL.

[11]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[12]  Constantin Orasan,et al.  Building annotated resources for automatic text summarisation , 2002, LREC.

[13]  Asanee Kawtrakul,et al.  Automatic building of an ontology on the basis of text corpora in Thai , 2008, Lang. Resour. Evaluation.

[14]  Mosleh Hmoud Al-Adhaileh,et al.  A Synchronization Structure of SSTC and Its Applications in Machine Translation , 2002, COLING 2002.

[15]  Chaveevan Pechsiri,et al.  Mining Causality from Texts for Question Answering System , 2007, IEICE Trans. Inf. Syst..

[16]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[17]  Dan Cristea,et al.  Summarisation Through Discourse Structure , 2005, CICLing.

[18]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[19]  Daniel Marcu The rhetorical parsing of natural language texts , 1997 .

[20]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[21]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.