Using Argumentative Zones for Extractive Summarization of Scientific Articles

Information structure, i.e the way speakers construct sentences to present new information in the context of old, can capture rich linguistic information about the discourse structure of scientific documents. Information structure has been found useful for important Natural Language Processing (NLP) tasks, such as information retrieval and extraction. Since scientific articles typically follow a certain discourse structure describing the prior work, problem being solved, methods used, and so forth, it could also be useful for summarization of these articles. In this work we focus on a scheme of information structure called Argumentative Zoning (AZ), and investigate whether its categories could support extractive text summarization in a scientific domain. We develop a summarization system that uses AZ categories (i) as features and (ii) in the final sentence selection process. We evaluate the system directly as well as using task-based evaluation. The results show that AZ can support both full document and customized summarization. We report a statistically significant improvement in summarization performance against a competitive baseline that uses journal section labels instead of AZ information. TITLE AND ABSTRACT IN MANDARIN

[1]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[2]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[3]  Dietrich Rebholz-Schuhmann,et al.  Using argumentation to extract key sentences from biomedical abstracts , 2007, Int. J. Medical Informatics.

[4]  Patrick Ruch,et al.  Using argumentation to retrieve articles with similar citations: An inquiry into improving related articles search in the MEDLINE digital library , 2006, Int. J. Medical Informatics.

[5]  Filip Radlinski,et al.  Learning diverse rankings with multi-armed bandits , 2008, ICML '08.

[6]  Dragomir R. Radev,et al.  Coherent Citation-Based Summarization of Scientific Papers , 2011, ACL.

[7]  Ferda Nur Alpaslan,et al.  Text Summarization of Turkish Texts using Latent Semantic Analysis , 2010, COLING.

[8]  Anna Korhonen,et al.  Weakly supervised learning of information structure of scientific abstracts - is it accurate enough to benefit real-world tasks in biomedicine? , 2011, Bioinform..

[9]  Dragomir R. Radev,et al.  Identifying Non-Explicit Citing Sentences for Citation-Based Summarization. , 2010, ACL.

[10]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[11]  ChengXiang Zhai,et al.  Generating Impact-Based Summaries for Scientific Literature , 2008, ACL.

[12]  Josef Steinberger,et al.  Improving LSA-based Summarization with Anaphora Resolution , 2005, HLT.

[13]  Chin-Yew Lin,et al.  Looking for a Few Good Metrics: Automatic Summarization Evaluation - How Many Samples Are Enough? , 2004, NTCIR.

[14]  Dragomir R. Radev,et al.  Citation Summarization Through Keyphrase Extraction , 2010, COLING.

[15]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[16]  Simone Teufel Towards Discipline-Independent Argumentative Zoning : Evidence from Chemistry and Computational Linguistics , 2009 .

[17]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[18]  Maria Liakata,et al.  Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes , 2010, BioNLP@ACL.

[19]  Hagit Shatkay,et al.  Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users , 2008, Bioinform..

[20]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[21]  Khalid Choukri,et al.  The european language resources association , 1998, LREC.

[22]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[23]  Thierry Poibeau,et al.  A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents , 2011, EMNLP.

[24]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[25]  Nigel Collier,et al.  Zone analysis in biology articles as a basis for information extraction , 2006, Int. J. Medical Informatics.

[26]  Simone Teufel,et al.  Towards Domain-Independent Argumentative Zoning: Evidence from Chemistry and Computational Linguistics , 2009, EMNLP.