Biomedical Natural Language Processing and Text Mining

Natural language processing and text mining (“BioNLP”) are branches of biomedical informatics that deal with processing prose, whether in journal articles or electronic medical records, for purposes such as extracting information, cohort retrieval, and other uses. They are made difficult by the rampant presence of ambiguity and variability in human-produced prose. In addition, biomedical text poses special challenges on a number of levels. Machine learning and rule-based approaches both have a long history in biomedical natural language processing, and hybrid systems are common. Much progress has been made in biomedical natural language processing and text mining in recent years, and the field is poised for explosive growth as new resources should become available in the near future. Many open opportunities for research remain.

[1]  K. Bretonnel Cohen,et al.  Concept Recognition and the TREC Genomics Tasks , 2005, TREC.

[2]  Zhiyong Lu,et al.  The gene normalization task in BioCreative III , 2011, BMC Bioinformatics.

[3]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.

[4]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[5]  Zhiyong Lu,et al.  Semantic role labeling for protein transport predicates , 2008, BMC Bioinformatics.

[6]  William R. Hersh,et al.  Information Retrieval: A Health and Biomedical Perspective , 2002 .

[7]  K. Bretonnel Cohen,et al.  Software Testing and the Naturally Occurring Data Assumption in Natural Language Processing , 2008, SETQALNLP.

[8]  Antonio Jimeno-Yepes,et al.  A Knowledge-Based Approach to Medical Records Retrieval , 2011, TREC.

[9]  Zhiyong Lu,et al.  Evaluation of Lexical Methods for Detecting Relationships Between Concepts from Multiple Ontologies , 2006, Pacific Symposium on Biocomputing.

[10]  Barbara Rosario,et al.  Multi-way Relation Classification: Application to Protein-Protein Interactions , 2005, HLT.

[11]  Simone Teufel,et al.  Annotation of Chemical Named Entities , 2007, BioNLP@ACL.

[12]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[13]  A. Valencia,et al.  Overview of the protein-protein interaction annotation extraction task of BioCreative II , 2008, Genome Biology.

[14]  K. Bretonnel Cohen,et al.  Contrast and variability in gene names , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[15]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[16]  James Pustejovsky,et al.  Robust Relational Parsing Over Biomedical Literature: Extracting Inhibit Relations , 2001, Pacific Symposium on Biocomputing.

[17]  K. Boon,et al.  Molecular Phenotypes Distinguish Patients with Relatively Stable from Progressive Idiopathic Pulmonary Fibrosis (IPF) , 2009, PloS one.

[18]  Emily Dimmer,et al.  An evaluation of GO annotation retrieval for BioCreAtIvE and GOA , 2005, BMC Bioinformatics.

[19]  Roser Morante,et al.  Learning the Scope of Hedge Cues in Biomedical Texts , 2009, BioNLP@HLT-NAACL.

[20]  K. Bretonnel Cohen,et al.  MetaMap is a Superior Baseline to a Standard Document Retrieval Engine for the Task of Finding Patient Cohorts in Clinical Free Text , 2011, TREC.

[21]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[22]  Nguyen Ha Vo,et al.  Efficient Extraction of Protein-Protein Interactions from Full-Text Articles , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[23]  Richard Tzong-Han Tsai,et al.  Overview of BioCreative II gene mention recognition , 2008, Genome Biology.

[24]  Hao Chen,et al.  Content-rich biological network constructed by mining PubMed abstracts , 2004, BMC Bioinformatics.

[25]  Lawrence E Hunter,et al.  Parenthetically speaking: classifying the contents of parentheses for text mining. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[26]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[27]  Zhiyong Lu,et al.  OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression , 2008, BMC Bioinformatics.

[28]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[29]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[30]  Sampo Pyysalo,et al.  Overview of BioNLP’09 Shared Task on Event Extraction , 2009, BioNLP@HLT-NAACL.

[31]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[32]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[33]  Yang Jin,et al.  Automated recognition of malignancy mentions in biomedical literature , 2006, BMC Bioinformatics.

[34]  Lorraine K. Tanabe,et al.  Tagging gene and protein names in full text articles , 2002, ACL Workshop on Natural Language Processing in the Biomedical Domain.

[35]  K. Bretonnel Cohen,et al.  MutationFinder: a high-performance system for extracting point mutation mentions from text , 2007, Bioinform..

[36]  Jason William Clark Information retrieval: a health and biomedical perspective. 3rd ed. , 2014 .

[37]  K. Cohen,et al.  Overview of BioCreative II gene normalization , 2008, Genome Biology.

[38]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[39]  Alexander A. Morgan,et al.  Overview of BioCreAtIvE task 1B: normalized gene lists , 2005, BMC Bioinformatics.

[40]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[41]  Carol Friedman,et al.  Research Paper: A General Natural-language Text Processor for Clinical Radiology , 1994, J. Am. Medical Informatics Assoc..

[42]  Barbara Rosario,et al.  Classifying Semantic Relations in Bioscience Texts , 2004, ACL.

[43]  Helen L. Johnson,et al.  Concept recognition for extracting protein interaction relations from biomedical text , 2008, Genome Biology.

[44]  Burr Settles ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[45]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[46]  H Page McAdams,et al.  Clinical and pathologic features of familial interstitial pneumonia. , 2005, American journal of respiratory and critical care medicine.

[47]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[48]  János Csirik,et al.  The BioScope corpus: biomedical texts annotated for uncertainty, negation and their scopes , 2008, BMC Bioinformatics.