Measuring Variability in Sentence Ordering for News Summarization

The issue of sentence ordering is an important one for natural language tasks such as multi-document summarization, yet there has not been a quantitative exploration of the range of acceptable sentence orderings for short texts. We present results of a sentence reordering experiment with three experimental conditions. Our findings indicate a very high degree of variability in the orderings that the eighteen subjects produce. In addition, the variability of reorderings is significantly greater when the initial ordering seen by subjects is different from the original summary. We conclude that evaluation of sentence ordering should use multiple reference orderings. Our evaluation presents several metrics that might prove useful in assessing against multiple references. We conclude with a deeper set of questions: (a) what sorts of independent assessments of quality of the different reference orderings could be made and (b) whether a large enough test set would obviate the need for such independent means of quality assessment.

[1]  Ernst Althaus,et al.  Computing Locally Coherent Discourses , 2004, ACL.

[2]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[3]  Janyce Wiebe,et al.  Word-Sense Distinguishability and Inter-Coder Agreement , 1998, EMNLP.

[4]  Eduard Hovy,et al.  Automated multi-document summarization in NeATS , 2002 .

[5]  Regina Barzilay,et al.  Towards Multidocument Summarization by Reformulation: Progress and Prospects , 1999, AAAI/IAAI.

[6]  Danushka Tarupathi Bollegala,et al.  A Machine Learning Approach To Sentence Ordering For Multi-Document Summarization and its Evaluation , 2005 .

[7]  Danushka Bollegala,et al.  A Machine Learning Approach to Sentence Ordering for Multidocument Summarization and Its Evaluation , 2005, IJCNLP.

[8]  John M. Conroy,et al.  Back to Basics: CLASSY 2006 , 2006 .

[9]  Naoaki Okazaki,et al.  Improving Chronological Sentence Ordering by Precedence Relation , 2004, COLING.

[10]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[11]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[12]  Perry J. Hardin,et al.  Comparing main diagonal entries in normalized confusion matrices: a bootstrapping approach , 1999, IEEE 1999 International Geoscience and Remote Sensing Symposium. IGARSS'99 (Cat. No.99CH36293).

[13]  Kathleen R. McKeown,et al.  Applying the Pyramid Method in DUC 2005 , 2005 .

[14]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[15]  Lori A. Westerkamp,et al.  Performance measures for summarizing confusion matrices: the AFRL COMPASE approach , 2002, SPIE Defense + Commercial Sensing.

[16]  Dragomir R. Radev,et al.  Generating Natural Language Summaries from Multiple On-Line Sources , 1998, CL.

[17]  Chris Mellish,et al.  Using a Corpus of Sentence Orderings Defined by Many Experts to Evaluate Metrics of Coherence for Text Structuring , 2005, ENLG.