An Efficient Statistical Approach for Automatic Organic Chemistry Summarization

In this paper, we propose an efficient strategy for summarizing scientific documents in Organic Chemistry that concentrates on numerical treatments. We present its implementation named yachs (Yet Another Chemistry Summarizer) that combines a specific document pre-processing with a sentence scoring method relying on the statistical properties of documents. We show that yachs achieves the best results among several other summarizers on a corpus made of Organic Chemistry articles.

[1]  Antonio Zamora,et al.  Automatic Abstracting Research at Chemical Abstracts Service , 1975, J. Chem. Inf. Comput. Sci..

[2]  Hoa Trang Dang,et al.  Overview of DUC 2005 , 2005 .

[3]  Hoa Trang Dang,et al.  DUC 2005: Evaluation of Question-Focused Summarization Systems , 2006 .

[4]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[5]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[6]  W. D. Climenson,et al.  Automatic syntax analysis in machine indexing and abstracting , 1961 .

[7]  Hyoil Han,et al.  The use of domain-specific concepts in biomedical text summarization , 2007, Inf. Process. Manag..

[8]  Florian Boudin,et al.  Mixing Statistical and Symbolic Approaches for Chemical Names Recognition , 2008, CICLing.

[9]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[10]  Karen Sparck Jones,et al.  Book Reviews: Evaluating Natural Language Processing Systems: An Analysis and Review , 1996, CL.

[11]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[12]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[13]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[14]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[15]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[16]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[17]  V. A. Yatsko,et al.  A method for evaluating modern systems of automatic text summarization , 2007, Automatic Documentation and Mathematical Linguistics.

[18]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[19]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.