A CTD–Pfizer collaboration: manual curation of 88 000 scientific articles text mined for drug–disease and drug–phenotype interactions

Improving the prediction of chemical toxicity is a goal common to both environmental health research and pharmaceutical drug development. To improve safety detection assays, it is critical to have a reference set of molecules with well-defined toxicity annotations for training and validation purposes. Here, we describe a collaboration between safety researchers at Pfizer and the research team at the Comparative Toxicogenomics Database (CTD) to text mine and manually review a collection of 88 629 articles relating over 1 200 pharmaceutical drugs to their potential involvement in cardiovascular, neurological, renal and hepatic toxicity. In 1 year, CTD biocurators curated 2 54 173 toxicogenomic interactions (1 52 173 chemical–disease, 58 572 chemical–gene, 5 345 gene–disease and 38 083 phenotype interactions). All chemical–gene–disease interactions are fully integrated with public CTD, and phenotype interactions can be downloaded. We describe Pfizer’s text-mining process to collate the articles, and CTD’s curation strategy, performance metrics, enhanced data content and new module to curate phenotype information. As well, we show how data integration can connect phenotypes to diseases. This curation can be leveraged for information about toxic endpoints important to drug safety and help develop testable hypotheses for drug–disease events. The availability of these detailed, contextualized, high-quality annotations curated from seven decades’ worth of the scientific literature should help facilitate new mechanistic screening assays for pharmaceutical compound survival. This unique partnership demonstrates the importance of resource sharing and collaboration between public and private entities and underscores the complementary needs of the environmental health science and pharmaceutical communities. Database URL: http://ctdbase.org/

[1]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[2]  Thomas C. Wiegers,et al.  Targeted journal curation as a method to improve data currency at the Comparative Toxicogenomics Database , 2012, Database J. Biol. Databases Curation.

[3]  Thomas C. Wiegers,et al.  Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical–gene–disease networks , 2008, Nucleic Acids Res..

[4]  Chihae Yang,et al.  Computational Toxicology Approaches at the US Food and Drug Administration a , 2009, Alternatives to laboratory animals : ATLA.

[5]  Manfred Hauben,et al.  Potential Use of Data-Mining Algorithms for the Detection of ‘Surprise’ Adverse Drug Reactions , 2007, Drug safety.

[6]  Joanna Lumb Pfizer: world's largest research‐based drug company , 2012 .

[7]  Thomas C. Wiegers,et al.  GeneComps and ChemComps: a new CTD metric to identify genes and chemicals with shared toxicogenomic profiles , 2009, Bioinformation.

[8]  Min Song,et al.  Entitymetrics: Measuring the Impact of Entities , 2013, PloS one.

[9]  Betsy L. Humphreys,et al.  Relationships in Medical Subject Headings (MeSH) , 2001 .

[10]  Carol A. Bean,et al.  Relationships in the Organization of Knowledge , 2001, Information Science and Knowledge Management.

[11]  Antonio Jimeno-Yepes,et al.  MeSH indexing based on automatically generated summaries , 2013, BMC Bioinformatics.

[12]  Paul Morgan,et al.  Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival. , 2012, Drug discovery today.

[13]  Søren Brunak,et al.  A computational approach to chemical etiologies of diabetes , 2013, Scientific Reports.

[14]  Claire O'Donovan,et al.  Biocurators and Biocuration: surveying the 21st century challenges , 2012, Database J. Biol. Databases Curation.

[15]  Randi Vita,et al.  The Biocurator: Connecting and Enhancing Scientific Data , 2006, PLoS Comput. Biol..

[16]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2013 , 2012, Nucleic Acids Res..

[17]  David S. Wishart,et al.  DrugBank 3.0: a comprehensive resource for ‘Omics’ research on drugs , 2010, Nucleic Acids Res..

[18]  P. Bork,et al.  A side effect resource to capture phenotypic effects of drugs , 2010, Molecular systems biology.

[19]  Thomas C. Wiegers,et al.  Ranking Transitive Chemical-Disease Inferences Using Local Network Topology in the Comparative Toxicogenomics Database , 2012, PloS one.

[20]  Doheon Lee,et al.  Generation and application of drug indication inference models using typed network motif comparison analysis , 2013, BMC Medical Informatics and Decision Making.

[21]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database: update 2011 , 2010, Nucleic Acids Res..

[22]  Thomas C. Wiegers,et al.  DiseaseComps: a metric that discovers similar diseases based upon common toxicogenomic profiles at CTD , 2011, Bioinformation.

[23]  Thomas C. Wiegers,et al.  The Comparative Toxicogenomics Database facilitates identification and understanding of chemical-gene-disease associations: arsenic as a case study , 2008, BMC Medical Genomics.

[24]  Jie Li,et al.  Prediction of Polypharmacological Profiles of Drugs by the Integration of Chemical, Side Effect, and Therapeutic Space , 2013, J. Chem. Inf. Model..

[25]  Xavier Navarro,et al.  Neurological monitoring reduces the incidence of bortezomib‐induced peripheral neuropathy in multiple myeloma patients , 2010, Journal of the peripheral nervous system : JPNS.

[26]  Patrick Aloy,et al.  Analysis of chemical and biological features yields mechanistic insights into drug side effects. , 2013, Chemistry & biology.

[27]  Jie Shen,et al.  Adverse Drug Events: Database Construction and in Silico Prediction , 2013, J. Chem. Inf. Model..

[28]  K. Bretonnel Cohen,et al.  Text mining and manual curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database (CTD) , 2009, BMC Bioinformatics.

[29]  Michael C. Rosenstein,et al.  The comparative toxicogenomics database: a cross-species resource for building chemical-gene interaction networks. , 2006, Toxicological sciences : an official journal of the Society of Toxicology.

[30]  Jennifer E. Rowley,et al.  Relationships in the Organization of Knowledge , 2002, J. Documentation.

[31]  Thomas C. Wiegers,et al.  Text Mining Effectively Scores and Ranks the Literature for Improving Chemical-Gene-Disease Curation at the Comparative Toxicogenomics Database , 2013, PloS one.

[32]  Thomas C. Wiegers,et al.  MEDIC: a practical disease vocabulary used at the Comparative Toxicogenomics Database , 2012, Database J. Biol. Databases Curation.

[33]  Thomas C. Wiegers,et al.  Collaborative biocuration—text-mining development task for document prioritization for curation , 2012, Database J. Biol. Databases Curation.

[34]  Thomas C. Wiegers,et al.  The curation paradigm and application tool used for manual curation of the scientific literature at the Comparative Toxicogenomics Database , 2011, Database J. Biol. Databases Curation.

[35]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.