Toward a Gold Standard for Extractive Text Summarization

Extractive text summarization is the process of selecting relevant sentences from a collection of documents, perhaps only a single document, and arranging such sentences in a purposeful way to form a summary of this collection The question arises just how good extractive summarization can ever be Without generating language to express the gist of a text – its abstract – can we expect to make summaries which are both readable and informative? In search for an answer, we employed a corpus partially labelled with Summary Content Units: snippets which convey the main ideas in the document collection Starting from this corpus, we created SCU-optimal summaries for extractive summarization We support the claim of optimality with a series of experiments.

[1]  Terry COPECK,et al.  Leveraging Pyramids , 2005 .

[2]  Vivi Nastase,et al.  Leveraging DUC , 2006 .

[3]  Jun-ichi Fukumoto,et al.  Automated Summarization Evaluation with Basic Elements. , 2006, LREC.

[4]  Stan Szpakowicz,et al.  A Study of Two Graph Algorithms in Topic-driven Summarization , 2006 .

[5]  Horacio Rodríguez,et al.  Support Vector Machines for Query-focused Summarization trained and evaluated on Pyramid data , 2007, ACL.

[6]  Guy Lapalme,et al.  HEXTAC: the Creation of a Manual Extractive Run , 2009, TAC.

[7]  Vasudeva Varma,et al.  Sentence Position revisited: A robust light-weight Update Summarization ‘baseline’ Algorithm , 2009 .

[8]  Vasudeva Varma,et al.  Query-Focused Summaries or Query-Biased Summaries? , 2009, ACL.

[9]  Ani Nenkova,et al.  Evaluating Content Selection in Summarization: The Pyramid Method , 2004, NAACL.

[10]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[11]  Dilek Z. Hakkani-Tür,et al.  The ICSI Summarization System at TAC 2008 , 2008, TAC.

[12]  Rebecca J. Passonneau,et al.  Formal and functional assessment of the pyramid method for summary content evaluation* , 2009, Natural Language Engineering.

[13]  Eduard H. Hovy,et al.  Identifying Topics by Position , 1997, ANLP.

[14]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[15]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[16]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.