Annotation of Chemical Named Entities

We describe the annotation of chemical named entities in scientific text. A set of annotation guidelines defines 5 types of named entities, and provides instructions for the resolution of special cases. A corpus of fulltext chemistry papers was annotated, with an inter-annotator agreement F score of 93%. An investigation of named entity recognition using LingPipe suggests that F scores of 63% are possible without customisation, and scores of 74% are possible with the addition of custom tokenisation and the use of dictionaries.

[1]  Michael F. Lynch,et al.  Extraction of Information from the Text of Chemical Patents. 1. Identification of Specific Chemical Names , 1998, J. Chem. Inf. Comput. Sci..

[2]  Manabu Torii,et al.  Using Unlabeled MEDLINE Abstracts for Biological Named Entity Classification , 2002 .

[3]  Allen C. Browne,et al.  Analysis of biomedical text for chemical names: a comparison of three methods , 1999, AMIA.

[4]  K. E. Ravikumar,et al.  A Biological Named Entity Recognizer , 2002, Pacific Symposium on Biocomputing.

[5]  Andreas Vlachos,et al.  Bootstrapping and Evaluating Named Entity Recognition in the Biomedical Domain , 2006, BioNLP@NAACL-HLT.

[6]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[7]  Peter Murray-Rust,et al.  High-Throughput Identification of Chemistry in Life Science Texts , 2006, CompLife.

[8]  Nigel Collier,et al.  Introduction to the Bio-entity Recognition Task at JNLPBA , 2004, NLPBA/BioNLP.

[9]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[10]  Manabu Torii,et al.  Using name-internal and contextual features to classify biological terms , 2004, J. Biomed. Informatics.

[11]  C. Lee Giles,et al.  Extraction and search of chemical formulae in text documents on the web , 2007, WWW '07.

[12]  Malvina Nissim,et al.  A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations , 2005, Comparative and functional genomics.

[13]  Alexander A. Morgan,et al.  Gene name identification and normalization using a model organism database , 2004, J. Biomed. Informatics.

[14]  Malvina Nissim,et al.  A system for identifying named entities in biomedical text: how results from two evaluations reflect on both the system and the evaluations: Conference Papers , 2005 .

[15]  Ivan Janciak,et al.  UK e-Science All Hands Meeting , 2009 .

[16]  Alexander Vasserman Identifying Chemical Names in Biomedical Text: an Investigation of Substring Co-occurrence Based Approaches , 2004, HLT-NAACL.

[17]  Sophia Ananiadou,et al.  Using automatically learnt verb selectional preferences for classification of biomedical terms , 2004, J. Biomed. Informatics.

[18]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[19]  Simone Teufel,et al.  Language Technology for Processing Chemistry Publications , 2005 .