论文信息 - Automated method for extracting “citation sentences” from online biomedical articles using SVM-based text summarization technique

Automated method for extracting “citation sentences” from online biomedical articles using SVM-based text summarization technique

Comment-on (CON), a MEDLINE citation field, indicates previously published articles commented on by authors of a given article expressing possibly complimentary or contradictory opinions. Our idea of identifying the CON list for a given article is to first extract all “citation sentences” from the body text, and then to recognize the sentences (“CON sentences”) among these that mention CON articles and to analyze the corresponding bibliographic data in the reference section. As a preprocessing step for identifying the CON list, this paper presents a general method for extracting “citation sentences” in the body text of online biomedical articles using a support vector machine (SVM)-based text summarization technique. Input feature vectors for the SVM are created by combining four types of features: 1) word statistics representing how differently a word occurs in “citation sentences” compared to other sentences, and the existence of 2) author names, 3) publication years, and 4) citation tags in a sentence. A rule-based post-processing step is also introduced to further reduce false negative errors in detecting “citation sentences”. Experiments on a set of online biomedical articles show that a SVM with a RBF achieves good performance overall in terms of accuracy, precision, recall, and F-measure rates. Our experiments also show that errors in extracting “citation sentences” cause a minor degradation of performance in identifying CON sentences, but can be improved through the proposed rule-based post-processing.

Daniel X. Le | In-Cheol Kim | George R. Thoma

[1] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[3] Maria Simi,et al. Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization , 2000, ECDL.

[4] Lucy Vanderwende,et al. Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[5] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[6] Eduard Hovy,et al. Automated Text Summarization in SUMMARIST , 1997, ACL 1997.

[7] Jie Zou,et al. Online medical journal article layout analysis , 2007, Electronic Imaging.

[8] Francine Chen,et al. A trainable document summarizer , 1995, SIGIR '95.

[9] Yuji Matsumoto,et al. Extracting Important Sentences with Support Vector Machines , 2002, COLING.

[10] Regina Barzilay,et al. Using Lexical Chains for Text Summarization , 1997 .

[11] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.