Using a Keyness Metric for Single and Multi Document Summarisation

In this paper we show the results of our participation in the MultiLing 2013 summarisation tasks. We participated with single-document and multi-document corpus-based summarisers for both Arabic and English languages. The summarisers used word frequency lists and log likelihood calculations to generate single and multi document summaries. The single and multi summaries generated by our systems were evaluated by Arabic and English native speaker participants and by different automatic evaluation metrics, ROUGE, AutoSummENG, MeMoG and NPowER. We compare our results to other systems that participated in the same tracks on both Arabic and English languages. Our single-document summarisers performed particularly well in the automatic evaluation with our English singledocument summariser performing better on average than the results of the other participants. Our Arabic multi-document summariser performed well in the human evaluation ranking second.

[1]  George Giannakopoulos,et al.  Summary Evaluation: Together We Stand NPowER-ed , 2013, CICLing.

[2]  Leonhard Hennig,et al.  Topic-based Multi-Document Summarization with Probabilistic Latent Semantic Analysis , 2009, RANLP.

[3]  Halil Kilicoglu,et al.  Automatic summarization of MEDLINE citations for evidence-based medical treatment: A topic-oriented evaluation , 2009, J. Biomed. Informatics.

[4]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[5]  Paul Rayson,et al.  Extending the Cochran rule for the comparison of word frequencies between corpora , 2004 .

[6]  Regina Barzilay,et al.  Sentence Ordering in Multidocument Summarization , 2001, HLT.

[7]  Paul Rayson,et al.  Comparing Corpora using Frequency Profiling , 2000, Proceedings of the workshop on Comparing corpora -.

[8]  Tim Buckwalter,et al.  A Frequency Dictionary of Arabic: Core Vocabulary for Learners , 2010 .

[9]  Nitin Madnani,et al.  Multiple Alternative Sentence Compressions for Automatic Text Summarization , 2007 .

[10]  Tetsuya Sakai,et al.  Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering , 2011, ACL.

[11]  George Giannakopoulos,et al.  AutoSummENG and MeMoG in Evaluating Guided Summaries , 2011, TAC.

[12]  John M. Conroy,et al.  Back to Basics: CLASSY 2006 , 2006 .

[13]  Mark Davies,et al.  The Corpus of Contemporary American English as the first reliable monitor corpus of English , 2010, Lit. Linguistic Comput..

[14]  Deborah Caine,et al.  Back to the Basics , 2021, Interceram - International Ceramic Review.

[15]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[16]  Lisa F. Rau,et al.  Automatic Condensation of Electronic Publications by Sentence Selection , 1995, Inf. Process. Manag..

[17]  Ani Nenkova,et al.  The Impact of Frequency on Summarization , 2005 .

[18]  Brigham Young The Corpus of Contemporary American English as the first reliable monitor corpus of English , 2010 .

[19]  Anthony McEnery,et al.  Rethinking Language Pedagogy from a Corpus Perspective: Papers from the Third International Conference on Teaching and Language Corpora , 2000 .

[20]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[21]  George A. Vouros,et al.  Summarization system evaluation revisited: N-gram graphs , 2008, TSLP.

[22]  Wenjie Li,et al.  Query Focus Guided Sentence Selection Strategy for DUC 2006 , 2006 .

[23]  George Giannakopoulos,et al.  TAC2011 MultiLing Pilot Overview , 2011, TAC.

[24]  Dragomir R. Radev,et al.  Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[25]  R RadevDragomir,et al.  Centroid-based summarization of multiple documents , 2004 .

[26]  Sylviane Granger,et al.  The computer learner corpus: a versatile new source of data for SLA research , 1998 .

[27]  Sylviane Granger The computer learner corpus: a versatile new source of data for SLA research: Sylviane Granger , 2014 .

[28]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.