Annotation and Analysis of Extractive Summaries for the Kyutech Corpus

Summarization of multi-party conversation is one of the important tasks in natural language processing. For conversation summarization tasks, corpora have an important role to analyze characteristics of conversations and to construct a method for summary generation. We are developing a freely available Japanese conversation corpus for a decision-making task. We call it the Kyutech corpus. The current version of the Kyutech corpus contains topic tags of each utterance and reference summaries of each conversation. In this paper, we explain an annotation task of extractive summaries. In the annotation task, we annotate an importance tag for each utterance and link utterances with sentences in reference summaries that already exist in the Kyutech corpus. By using the annotated extractive summaries, we can evaluate extractive summarization methods on the Kyutech corpus. In the experiment, we compare some methods based on machine learning techniques with some features.

[1]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[2]  Kôiti Hasida,et al.  ISO 24617-2: A semantically-based standard for dialogue annotation , 2012, LREC.

[3]  Αλέξιος Γεωργίου Γιδιώτης Abstractive Text Summarization , 2020, Journal of Xidian University.

[4]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[5]  Giuseppe Carenini,et al.  Generating and Validating Abstracts of Meeting Conversations: a User Study , 2010, INLG.

[6]  Giuseppe Carenini,et al.  Abstractive Summarization of Spoken and Written Conversations Based on Phrasal Queries , 2014, ACL.

[7]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[8]  Prasenjit Mitra,et al.  Generating Abstractive Summaries from Meeting Transcripts , 2015, DocEng.

[9]  yamamura takashi,et al.  Multi-party conversation summarization using time information and text segmentation , 2015 .

[10]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[11]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[12]  Claire Cardie,et al.  Focused Meeting Summarization via Unsupervised Relation Extraction , 2012, SIGDIAL Conference.

[13]  Julia Hirschberg,et al.  From text to speech summarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[14]  Sadaoki Furui,et al.  Sentence-extractive automatic speech summarization and evaluation techniques , 2006, Speech Commun..

[15]  Claire Cardie,et al.  Domain-Independent Abstract Generation for Focused Meeting Summarization , 2013, ACL.

[16]  Hui Lin,et al.  Evaluating the effectiveness of features and sampling in extractive meeting summarization , 2008, 2008 IEEE Spoken Language Technology Workshop.

[17]  Giuseppe Carenini,et al.  Abstractive Meeting Summarization with Entailment and Fusion , 2013, ENLG.

[18]  Shintaro Kawahara,et al.  The Kyutech corpus and topic segmentation using a combined method , 2016, ALR@COLING.

[19]  Giuseppe Carenini,et al.  A Template-based Abstractive Meeting Summarization: Leveraging Summary and Source Text Relationships , 2014, INLG.

[20]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[21]  Michalis Vazirgiannis,et al.  Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization , 2017, NFiS@EMNLP.

[22]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..