Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing

Natural language processing has a long history in the medical domain, with research in the field dating back to at least the early 1960s. In the late 1990s, a separate thread of research involving natural language processing in the genomic domain began to gather steam. It has become a major focus of research in the bioinformatics, computational biology, and computational linguistics communities. A number of successful workshops and conference sessions have resulted, with significant progress in the areas of named entity recognition for a wide range of key biomedical classes, concept normalization, and system evaluation. A variety of publicly available resources have contributed to this progress, as well. Recently, the widely recognized disconnect between basic biological research and patient care delivery stimulated development of a new branch of biomedical research---translational medicine. Translational medicine, sometimes defined as the facilitation of "bench-to-bedside" transmission of knowledge, has become a hot topic, with a National Center for Biocomputing devoted to this theme established last year. This workshop has the goal of addressing and bringing together these three threads in biomedical natural language processing, or "BioNLP:" biological, translational, and clinical language processing.

[1]  Alastair Baker,et al.  Crossing the Quality Chasm: A New Health System for the 21st Century , 2001, BMJ : British Medical Journal.

[2]  Claire Grover,et al.  Adapting a Relation Extraction Pipeline for the BioCreAtIvE II Tasks , 2007 .

[3]  Peer Bork,et al.  Extracting Regulatory Gene Expression Networks From Pubmed , 2004, ACL.

[4]  Ioannis Xenarios,et al.  DIP: the Database of Interacting Proteins , 2000, Nucleic Acids Res..

[5]  Michael Blench Global Public Health Intelligence Network (GPHIN) , 2008, AMTA.

[6]  Alan R. Aronson,et al.  Semi-Automatic Indexing of Full Text Biomedical Articles , 2005, AMIA.

[7]  Ozlem Uzuner,et al.  Second i2b2 workshop on natural language processing challenges for clinical records. , 2008, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[8]  Tapio Salakoski,et al.  On the unification of syntactic annotations under the Stanford dependency scheme: A case study on BioInfer and GENIA , 2007, BioNLP@ACL.

[9]  Jian Su,et al.  Recognizing Names in Biomedical Texts: a Machine Learning Approach , 2004 .

[10]  Koby Crammer,et al.  Flexible Text Segmentation with Structured Multilabel Classification , 2005, HLT.

[11]  Wlodzislaw Duch,et al.  Preparing Clinical Text for Use in Biomedical Research , 2006, J. Database Manag..

[12]  Carol Friedman,et al.  Combining multiple evidence for gene symbol disambiguation , 2007, BioNLP@ACL.

[13]  Marti A. Hearst,et al.  Exploring the Efficacy of Caption Search for Bioscience Journal Search Interfaces , 2007, BioNLP@ACL.

[14]  Baohua Gu Recognizing Nested Named Entities in GENIA corpus , 2006, BioNLP@NAACL-HLT.

[15]  Miguel A. Andrade-Navarro,et al.  Information extraction from full text scientific articles: Where are the keywords? , 2003, BMC Bioinformatics.

[16]  Olivier Bodenreider,et al.  From indexing the biomedical literature to coding clinical text: experience with MTI and machine learning approaches , 2007, BioNLP@ACL.

[17]  Jian Su,et al.  Enhancing HMM-based biomedical named entity recognition by studying special phenomena , 2004, J. Biomed. Informatics.

[18]  Burr Settles,et al.  Biomedical Named Entity Recognition using Conditional Random Fields and Rich Feature Sets , 2004, NLPBA/BioNLP.

[19]  Wlodzislaw Duch,et al.  Development of a Pediatric Text-Corpus for Part-of-Speech Tagging , 2004, Intelligent Information Systems.

[20]  Susanne M. Humphrey,et al.  The NLM Indexing Initiative's Medical Text Indexer , 2004, MedInfo.

[21]  Ilya M. Goldin,et al.  Learning to Detect Negation with ‘Not’ in Medical Texts , 2003 .

[22]  Alexander A. Morgan,et al.  Evaluation of text data mining for database curation: lessons learned from the KDD Challenge Cup , 2003, ISMB.

[23]  Shih-Fu Chang,et al.  Exploring Text and Image Features to Classify Images in Bioscience Literature , 2006, BioNLP@NAACL-HLT.

[24]  Son Doan,et al.  The Role of Roles in Classifying Annotated Biomedical Text , 2007, BioNLP@ACL.

[25]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[26]  Naomi Sager,et al.  Chapter 2. Automatic Information Formatting of a Medical Sublanguage , 1982 .

[27]  Peter Jackson,et al.  Natural language processing for online applications : text retrieval, extraction and categorization , 2002 .

[28]  Hong Yu,et al.  Accessing bioscience images from abstract sentences , 2006, ISMB.

[29]  Claire Grover,et al.  Tools to Address the Interdependence between Tokenisation and Standoff Annotation , 2006, NLPXML@EACL.

[30]  Burr Settles,et al.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[31]  Simone Teufel,et al.  Annotation of Chemical Named Entities , 2007, BioNLP@ACL.

[32]  K. Cohen,et al.  Biomedical language processing: what's beyond PubMed? , 2006, Molecular cell.

[33]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[34]  Carol Friedman,et al.  Two biomedical sublanguages: a description based on the theories of Zellig Harris , 2002, J. Biomed. Informatics.

[35]  Ted Pedersen,et al.  Determining the Syntactic Structure of Medical Terms in Clinical Notes , 2007, BioNLP@ACL.

[36]  Brian Roark,et al.  Syntactic complexity measures for detecting Mild Cognitive Impairment , 2007, BioNLP@ACL.

[37]  G. D. Zhou,et al.  Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid , 2006, Int. J. Medical Informatics.

[38]  Jimmy J. Lin,et al.  Fusion of Knowledge-Intensive and Statistical Approaches for Retrieving and Annotating Textual Genomics Documents , 2005, TREC.

[39]  Wendy W. Chapman,et al.  A Simple Algorithm for Identifying Negated Findings and Diseases in Discharge Summaries , 2001, J. Biomed. Informatics.

[40]  Ossama Emam,et al.  BioNoculars: Extracting Protein-Protein Interactions from Biomedical Text , 2007, BioNLP@ACL.

[41]  Toshihisa Takagi,et al.  Gene/Protein/Family Name Recognition in Biomedical Literature , 2004, HLT-NAACL 2004.

[42]  Haibin Liu,et al.  An Unsupervised Method for Extracting Domain-specific Affixes in Biological Literature , 2007, BioNLP@ACL.

[43]  Noémie Elhadad,et al.  Mining a Lexicon of Technical Terms and Lay Equivalents , 2007, BioNLP@ACL.

[44]  Hagit Shatkay,et al.  Integrating image data into biomedical text categorization , 2006, ISMB.

[45]  Andrew B. Clegg,et al.  Evaluating and Integrating Treebank Parsers on a Biomedical Corpus , 2005, ACL 2005.

[46]  Peer Bork,et al.  Extraction of regulatory gene/protein networks from Medline , 2006, Bioinform..

[47]  Fang Liu,et al.  FigSearch: a figure legend indexing and classification system , 2004, Bioinform..

[48]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[49]  K. Bretonnel Cohen,et al.  A shared task involving multi-label classification of clinical free text , 2007, BioNLP@ACL.

[50]  Wendy W. Chapman,et al.  ConText: An Algorithm for Identifying Contextual Features from Clinical Text , 2007, BioNLP@ACL.

[51]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[52]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[53]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[54]  Patrick Ruch,et al.  Finding Relevant Passages in Scientific Articles: Fusion of Automatic Approaches vs. an Interactive Team Effort , 2006, TREC.

[55]  Robert F. Murphy,et al.  Robust Numerical Features for Description and Classification of Subcellular Location Patterns in Fluorescence Microscope Images , 2003, J. VLSI Signal Process..

[56]  Malvina Nissim,et al.  Exploring the boundaries: gene and protein identification in biomedical text , 2005, BMC Bioinformatics.

[57]  Rohini K. Srihari,et al.  Piction: A System That Uses Captions to Label Human Faces in Newspaper Photographs , 1991, AAAI.

[58]  Olivier Bodenreider,et al.  Utilizing the UMLS for Semantic Mapping between Terminologies , 2005, AMIA.

[59]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[60]  Olivier Bodenreider,et al.  Beyond synonymy: exploiting the UMLS semantics in mapping vocabularies , 1998, AMIA.

[61]  Jian Su,et al.  Effective Adaptation of Hidden Markov Model-based Named Entity Recognizer for Biomedical Domain , 2003, BioNLP@ACL.

[62]  Harald Reiterer,et al.  INSYDER: a content-based visual-information-seeking system for the Web , 2005, International Journal on Digital Libraries.

[63]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[64]  Ricky K. Taira,et al.  Text Boundary Detection of Medical Reports , 2002, AMIA.

[65]  Özlem Uzuner,et al.  Role of Local Context in Automatic Deidentification of Ungrammatical, Fragmented Text , 2006, NAACL.

[66]  Wendy W. Chapman,et al.  Evaluation of negation phrases in narrative clinical reports , 2001, AMIA.

[67]  Sophia Ananiadou,et al.  Text Mining for Biology And Biomedicine , 2005 .

[68]  Lorraine K. Tanabe,et al.  Tagging gene and protein names in biomedical text , 2002, Bioinform..