Condensing biomedical journal texts through paragraph ranking

MOTIVATION The growing availability of full-text scientific articles raises the important issue of how to most efficiently digest full-text content. Although article titles and abstracts provide accurate and concise information on an article's contents, their brevity inevitably entails the loss of detail. Full-text articles provide those details, but require more time to read. The primary goal of this study is to combine the advantages of concise abstracts and detail-rich full-texts to ease the burden of reading. RESULTS We retrieved abstract-related paragraphs from full-text articles through shared keywords between the abstract and paragraphs from the main text. Significant paragraphs were then recommended by applying a proposed paragraph ranking approach. Finally, the user was provided with a condensed text consisting of these significant paragraphs, allowing the user to save time from perusing the whole article. We compared the performance of the proposed approach with a keyword counting approach and a PageRank-like approach. Evaluation was conducted in two aspects: the importance of each retrieved paragraph and the information coverage of a set of retrieved paragraphs. In both evaluations, the proposed approach outperformed the other approaches. CONTACT jchiang@mail.ncku.edu.tw.

[1]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[2]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[3]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[4]  Roman A. Laskowski,et al.  Enhancing the functional annotation of PDB structures in PDBsum using key figures extracted from the literature , 2007, Bioinform..

[5]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[6]  Jung-Hsien Chiang,et al.  GeneLibrarian: an effective gene-information summarization and visualization system , 2006, BMC Bioinformatics.

[7]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[8]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[9]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[10]  Alan R. Aronson,et al.  Semi-Automatic Indexing of Full Text Biomedical Articles , 2005, AMIA.

[11]  Zhiyong Lu,et al.  Click-words: learning to predict document keywords from a user perspective , 2010, Bioinform..

[12]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[13]  Pawan Kumar,et al.  Notice of Violation of IEEE Publication Principles The Anatomy of a Large-Scale Hyper Textual Web Search Engine , 2009 .

[14]  Jimmy J. Lin Is searching full text more effective than searching abstracts? , 2009, BMC Bioinformatics.

[15]  Miguel A. Andrade-Navarro,et al.  Information extraction from full text scientific articles: Where are the keywords? , 2003, BMC Bioinformatics.

[16]  Jimmy J. Lin,et al.  PubMed related articles: a probabilistic topic-based model for content similarity , 2007, BMC Bioinformatics.

[17]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[18]  Claus-Wilhelm von der Lieth,et al.  PubFinder: a tool for improving retrieval rate of relevant PubMed abstracts , 2005, Nucleic Acids Res..

[19]  Matthias Frisch,et al.  LitInspector: literature and signal transduction pathway mining in PubMed abstracts , 2009, Nucleic Acids Res..