论文信息 - Condensing biomedical journal texts through paragraph ranking

Condensing biomedical journal texts through paragraph ranking

MOTIVATION The growing availability of full-text scientific articles raises the important issue of how to most efficiently digest full-text content. Although article titles and abstracts provide accurate and concise information on an article's contents, their brevity inevitably entails the loss of detail. Full-text articles provide those details, but require more time to read. The primary goal of this study is to combine the advantages of concise abstracts and detail-rich full-texts to ease the burden of reading. RESULTS We retrieved abstract-related paragraphs from full-text articles through shared keywords between the abstract and paragraphs from the main text. Significant paragraphs were then recommended by applying a proposed paragraph ranking approach. Finally, the user was provided with a condensed text consisting of these significant paragraphs, allowing the user to save time from perusing the whole article. We compared the performance of the proposed approach with a keyword counting approach and a PageRank-like approach. Evaluation was conducted in two aspects: the importance of each retrieved paragraph and the information coverage of a set of retrieved paragraphs. In both evaluations, the proposed approach outperformed the other approaches. CONTACT jchiang@mail.ncku.edu.tw.

[1] Hans Peter Luhn,et al. The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[3] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4] Roman A. Laskowski,et al. Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature , 2007, Bioinform..

[5] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[6] Jung-Hsien Chiang,et al. GeneLibrarian: an effective gene-information summarization and visualization system , 2006, BMC Bioinformatics.

[7] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8] Peter D. Turney. Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[9] Mark Steedman,et al. Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[10] Alan R. Aronson,et al. Semi-Automatic Indexing of Full Text Biomedical Articles , 2005, AMIA.

[11] Zhiyong Lu,et al. Click-words: learning to predict document keywords from a user perspective , 2010, Bioinform..

[12] Anette Hulth,et al. Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[13] Pawan Kumar,et al. Notice of Violation of IEEE Publication Principles The Anatomy of a Large-Scale Hyper Textual Web Search Engine , 2009 .

[14] Jimmy J. Lin. Is searching full text more effective than searching abstracts? , 2009, BMC Bioinformatics.

[15] Miguel A. Andrade-Navarro,et al. Information extraction from full text scientific articles: Where are the keywords? , 2003, BMC Bioinformatics.

[16] Jimmy J. Lin,et al. PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[17] Michael Schroeder,et al. GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[18] Claus-Wilhelm von der Lieth,et al. PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts , 2005, Nucleic Acids Res..

[19] Matthias Frisch,et al. LitInspector: literature and signal transduction pathway mining in PubMed abstracts , 2009, Nucleic Acids Res..