Using Nanoinformatics Methods for Automatically Identifying Relevant Nanotoxicology Entities from the Literature

Nanoinformatics is an emerging research field that uses informatics techniques to collect, process, store, and retrieve data, information, and knowledge on nanoparticles, nanomaterials, and nanodevices and their potential applications in health care. In this paper, we have focused on the solutions that nanoinformatics can provide to facilitate nanotoxicology research. For this, we have taken a computational approach to automatically recognize and extract nanotoxicology-related entities from the scientific literature. The desired entities belong to four different categories: nanoparticles, routes of exposure, toxic effects, and targets. The entity recognizer was trained using a corpus that we specifically created for this purpose and was validated by two nanomedicine/nanotoxicology experts. We evaluated the performance of our entity recognizer using 10-fold cross-validation. The precisions range from 87.6% (targets) to 93.0% (routes of exposure), while recall values range from 82.6% (routes of exposure) to 87.4% (toxic effects). These results prove the feasibility of using computational approaches to reliably perform different named entity recognition (NER)-dependent tasks, such as for instance augmented reading or semantic searches. This research is a “proof of concept” that can be expanded to stimulate further developments that could assist researchers in managing data, information, and knowledge at the nanolevel, thus accelerating research in nanomedicine.

[1]  C. Friedman,et al.  Using BLAST for identifying gene and protein names in journal articles. , 2000, Gene.

[2]  Lizhong Zhu,et al.  Toxicity of ZnO nanoparticles to Escherichia coli: mechanism and the influence of medium components. , 2011, Environmental science & technology.

[3]  V Maojo,et al.  Integration of Relational and Textual Biomedical Sources , 2009, Methods of Information in Medicine.

[4]  Giorgio Valle,et al.  The Gene Ontology in 2010: extensions and refinements , 2009, Nucleic Acids Res..

[5]  Min Li,et al.  High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge , 2010, J. Am. Medical Informatics Assoc..

[6]  Karen Sparck Jones A statistical interpretation of term specificity and its application in retrieval , 1972 .

[7]  Robert J. Taylor,et al.  Implementation Brief: Description of a Rule-based System for the i2b2 Challenge in Natural Language Processing for Clinical Data , 2009, J. Am. Medical Informatics Assoc..

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[10]  T. Takagi,et al.  Toward information extraction: identifying protein names from biological papers. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[11]  Ninad K. Mishra,et al.  Research Paper: A Rule-based Approach for Identifying Obesity and Its Comorbidities in Medical Discharge Summaries , 2009, J. Am. Medical Informatics Assoc..

[12]  Meirong Zhao,et al.  Effects of titanium dioxide nano-particles on growth and some histological parameters of zebrafish (Danio rerio) after a long-term exposure. , 2011, Aquatic toxicology.

[13]  David J. Robertson,et al.  Polyethylenimine-conjugated gold nanoparticles: Gene transfer potential and low toxicity in the cornea. , 2011, Nanomedicine : nanotechnology, biology, and medicine.

[14]  Zhiyong Lu,et al.  Extracting Rx information from clinical narrative , 2010, J. Am. Medical Informatics Assoc..

[15]  Miguel García-Remesal,et al.  A method for automatically extracting infectious disease-related primers and probes from the literature , 2010, BMC Bioinformatics.

[16]  Hong Yu,et al.  Lancet: a high precision medication event extraction system for clinical text , 2010, J. Am. Medical Informatics Assoc..

[17]  Wendy W. Chapman,et al.  Anaphoric relations in the clinical narrative: corpus creation , 2011, J. Am. Medical Informatics Assoc..

[18]  Hui Yang,et al.  Automatic extraction of medication information from medical discharge summaries , 2010, J. Am. Medical Informatics Assoc..

[19]  Azam Bolhassani,et al.  Improvement of different vaccine delivery systems for cancer therapy , 2011, Molecular Cancer.

[20]  Burr Settles,et al.  ABNER: an open source tool for automatically tagging genes, proteins and other entity names in text , 2005 .

[21]  Fei Xia,et al.  Community annotation experiment for ground truth generation for the i2b2 medication challenge , 2010, J. Am. Medical Informatics Assoc..

[22]  Goran Nenadic,et al.  Medication information extraction with linguistic pattern matching and semantic rules , 2010, J. Am. Medical Informatics Assoc..

[23]  Behrang Mohit,et al.  Named Entity Recognition , 2014, NLP of Semitic Languages.

[24]  Jonathan D. Wren,et al.  Markov model recognition and classification of DNA/protein sequences within large text databases , 2005, Bioinform..

[25]  Domonkos Tikk,et al.  Improving textual medication extraction using combined conditional random fields and rule-based systems , 2010, J. Am. Medical Informatics Assoc..

[26]  Dwight G. Nishimura,et al.  FeCo/Graphite Nanocrystals for Multi-Modality Imaging of Experimental Vascular Inflammation , 2011, PloS one.

[27]  Alberto Anguita,et al.  PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids , 2010, Bioinform..

[28]  Russ B. Altman,et al.  GAPSCORE: finding gene and protein names one word at a time , 2004, Bioinform..

[29]  Martin Fritts,et al.  Nanoinformatics: Developing Advanced Informatics Applications for Nanomedicine , 2011 .

[30]  Natalia Grabar,et al.  Linguistic approach for identification of medication names and related information in clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[31]  José L. V. Mejino,et al.  A reference ontology for biomedical informatics: the Foundational Model of Anatomy , 2003, J. Biomed. Informatics.

[32]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[33]  Goran Nenadic,et al.  LINNAEUS: A species name identification system for biomedical literature , 2010, BMC Bioinformatics.

[34]  Russ B. Altman,et al.  Pharmspresso: a text mining tool for extraction of pharmacogenomic concepts and relationships from full text , 2009, BMC Bioinformatics.

[35]  R. Gaizauskas,et al.  Term Recognition and Classification in Biological Science Journal Articles , 1998 .

[36]  Özlem Uzuner,et al.  Viewpoint Paper: Recognizing Obesity and Comorbidities in Sparse Data , 2009, J. Am. Medical Informatics Assoc..

[37]  Nigel Collier,et al.  Extracting the Names of Genes and Gene Products with a Hidden Markov Model , 2000, COLING.

[38]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[39]  Jian Su,et al.  Named Entity Recognition using an HMM-based Chunk Tagger , 2002, ACL.

[40]  Steven J. M. Jones,et al.  Text-mining assisted regulatory annotation , 2008, Genome Biology.

[41]  Robert A. Freitas,et al.  Nanomedicine, Volume I: Basic Capabilities , 1999 .

[42]  Akira Matsumura,et al.  Newly Synthesized Radical-Containing Nanoparticles Enhance Neuroprotection After Cerebral Ischemia-Reperfusion Injury , 2011, Neurosurgery.

[43]  Alexander A. Morgan,et al.  Gene name identification and normalization using a model organism database , 2004, J. Biomed. Informatics.

[44]  Burkhard Rost,et al.  NLProt: extracting protein names and sequences from papers , 2004, Nucleic Acids Res..

[45]  Ting Zhang,et al.  WITHDRAWN: Mechanism of inflammatory responses in brain and impairment of spatial memory of mice caused by titanium dioxide nanoparticles. , 2011, Journal of hazardous materials.

[46]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[47]  V Maojo,et al.  International Efforts in Nanoinformatics Research Applied to Nanomedicine , 2010, Methods of Information in Medicine.

[48]  Miguel García-Remesal,et al.  Logical Schema Acquisition from Text-Based Sources for Structured and Non-Structured Biomedical Sources Integration , 2007, AMIA.

[49]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[50]  Robert Sinclair,et al.  Oxidative stress mediates the effects of Raman-active gold nanoparticles in human cells. , 2011, Small.

[51]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[52]  Hongfang Liu,et al.  BioTagger-GM: a gene/protein name recognition system. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[53]  Domonkos Tikk,et al.  Research Paper: Semantic Classification of Diseases in Discharge Summaries Using a Context-aware Rule-based Classifier , 2009, J. Am. Medical Informatics Assoc..

[54]  Michal Konkol,et al.  Named Entity Recognition , 2012 .

[55]  István Hegedüs,et al.  Research Paper: Semi-automated Construction of Decision Rules to Predict Morbidities from Clinical Texts , 2009, J. Am. Medical Informatics Assoc..

[56]  Lorraine K. Tanabe,et al.  GENETAG: a tagged corpus for gene/protein named entity recognition , 2005, BMC Bioinformatics.

[57]  Nathan A. Baker,et al.  NanoParticle Ontology for cancer nanotechnology research , 2011, J. Biomed. Informatics.

[58]  Proux,et al.  Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction. , 1998, Genome informatics. Workshop on Genome Informatics.

[59]  Martin Fritts,et al.  Nanoinformatics and DNA-Based Computing: Catalyzing Nanomedicine , 2010, Pediatric Research.