Introducing a Corpus of Human-Authored Dialogue Summaries in Portuguese

In this paper, we introduce a corpus of human-authored dialogue summaries collected through a web-experiment. The corpus features (i) one of the few existing corpora of written dialogue summaries; (ii) the only corpus available for dialogue summaries in Portuguese; and (iii) the only available corpus of summaries produced for dialogues whose participants’ politeness alignment was systematically varied. Comprising 1,808 human-authored summaries, produced by 452 summarisers, for four different dialogues, this is, to the best of our knowledge, the largest individual corpus available for dialogue summaries, with the highest number of participants involved.

[1]  Miriam A. Locher,et al.  Politeness. , 1958, Medical technicians bulletin.

[2]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[3]  Khalid Choukri,et al.  The european language resources association , 1998, LREC.

[4]  Kathleen McKeown,et al.  The decomposition of human-written summary sentences , 1999, SIGIR '99.

[5]  Chris Brew,et al.  Requirements, Tools, and Architectures for Annotated Corpora , 2000 .

[6]  Massih-Reza Amini Interactive Learning for Text Summarization , 2000 .

[7]  Ulf-Dietrich Reips Standards for Internet-based experimenting. , 2002, Experimental psychology.

[8]  Hongyan Jing,et al.  Using Hidden Markov Modeling to Decompose Human-Written Summaries , 2002, Computational Linguistics.

[9]  Kathleen F. McCoy,et al.  Efficiently Computed Lexical Chains as an Intermediate Representation for Automatic Text Summarization , 2002, CL.

[10]  S. Szpakowicz,et al.  Vocabulary Usage in Newswire Summaries , 2004, Workshop On Text Summarization Branches Out.

[11]  Owen Rambow,et al.  Summarizing Email Threads , 2004, NAACL.

[12]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[13]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[14]  Paul Piwek,et al.  Politeness and Bias in Dialogue Summarization: Two Exploratory Studies , 2006, Computing Attitude and Affect in Text.

[15]  Paul Piwek,et al.  A Web-Experiment on Dialogue Classification , 2006 .

[16]  Laura Hasler,et al.  FROM EXTRACTS TO ABSTRACTS: HUMAN SUMMARY PRODUCTION OPERATIONS FOR COMPUTER-AIDED SUMMARISATION , 2007 .

[17]  P. Piwek,et al.  A Web-Based Experiment on Dialogue Summarisation ∗ , 2008 .

[18]  Yang Liu,et al.  What Are Meeting Summaries? An Analysis of Human Extractive Summaries in Meeting Corpus , 2008, SIGDIAL Workshop.

[19]  Ichael,et al.  The UAM CorpusTool : software for corpus annotation and exploration , 2008 .

[20]  Michael James O'Donnell,et al.  The UAM CorpusTool: software for corpus annotation and exploration , 2009 .

[21]  Juan-Manuel Torres-Moreno,et al.  A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression , 2010, LREC.

[22]  Kathleen F. McCoy,et al.  A Corpus of Human-written Summaries of Line Graphs , 2011 .

[23]  Horacio Saggion,et al.  The CONCISUS Corpus of Event Summaries , 2012, LREC.

[24]  John Atkinson,et al.  Rhetorics-based multi-document summarization , 2013, Expert Syst. Appl..