Extracting and characterizing gene-drug relationships from the literature.

A fundamental task of pharmacogenetics is to collect and classify relationships between genes and drugs. Currently, this useful information has not been comprehensively aggregated in any database and remains scattered throughout the published literature. Although there are efforts to collect this information manually, they are limited by the size of the published literature on gene-drug relationships. Therefore, we investigated computational methods to extract and characterize pharmacogenetic relationships between genes and drugs from the literature. We first evaluated the effectiveness of the co-occurrence method in identifying related genes and drugs. We then used supervised machine learning algorithms to classify the relationships between genes and drugs from the Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB) into five categories that have been defined by active pharmacogenetic researchers as relevant to their work. The final co-occurrence algorithm was able to extract 78% of the related genes and drugs that were published in a review article from the literature. Our algorithm subsequently classified the relationships between genes and drugs from the PharmGKB into five categories with 74% accuracy. We have made the data available on a supplementary website at http://bionlp.stanford.edu/genedrug/ Gene-drug relationships can be accurately extracted from text and classified into categories. Although the relationships that we have identified do not capture the details and fine distinctions often made in the literature, these methods will help scientists to track the ever-growing literature and create information resources to support future discoveries.

[1]  Russ B. Altman,et al.  Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE , 2002, J. Am. Medical Informatics Assoc..

[2]  R. Altman,et al.  PharmGKB: the pharmacogenetics and pharmacogenomics knowledge base. , 2005, Methods in molecular biology.

[3]  Michael Böhm,et al.  Increased frequency of cytochrome P450 2D6 poor metabolizers among patients with metoprolol‐associated adverse effects , 2002, Clinical pharmacology and therapeutics.

[4]  Jong C. Park,et al.  Using Combinatory Categorial Grammar to Extract Biomedical Information , 2001, IEEE Intell. Syst..

[5]  Toshihisa Takagi,et al.  Automated extraction of information on protein-protein interactions from the biological literature , 2001, Bioinform..

[6]  O. Faergeman,et al.  The apolipoprotein epsilon4 allele determines prognosis and the effect on prognosis of simvastatin in survivors of myocardial infarction : a substudy of the Scandinavian simvastatin survival study. , 2000, Circulation.

[7]  Chih-Chiang Chiu,et al.  Association of risperidone treatment response with a polymorphism in the 5-HT(2A) receptor gene. , 2002, The American journal of psychiatry.

[8]  Kitchin Kt Laboratory methods for ten hepatic toxification/detoxification parameters. , 1983 .

[9]  Russ B. Altman,et al.  PharmGKB: the Pharmacogenetics Knowledge Base , 2002, Nucleic Acids Res..

[10]  W B Jakoby,et al.  The glutathione S-transferases: a group of multifunctional detoxification proteins. , 1978, Advances in enzymology and related areas of molecular biology.

[11]  P Bork,et al.  Automated extraction of information in molecular biology , 2000, FEBS letters.

[12]  G. Sauter,et al.  Carcinomas of the renal pelvis associated with smoking and phenacetin abuse: p53 mutations and polymorphism of carcinogen‐metabolising enzymes , 1998, International journal of cancer.

[13]  Park,et al.  Developing NLP Tools for Genome Informatics: An Information Extraction Perspective. , 1998, Genome informatics. Workshop on Genome Informatics.

[14]  Jun'ichi Tsujii,et al.  Event Extraction from Biomedical Papers Using a Full Parser , 2000, Pacific Symposium on Biocomputing.

[15]  D. Agarwal Genetic polymorphisms of alcohol metabolizing enzymes. , 2001, Pathologie-biologie.

[16]  Joshua M. Stuart,et al.  Integrating genotype and phenotype information: an overview of the PharmGKB project , 2001, The Pharmacogenomics Journal.

[17]  Miguel A. Andrade-Navarro,et al.  Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions , 1999, ISMB.

[18]  M. Relling,et al.  Pharmacogenomics: translating functional genomics into rational therapeutics. , 1999, Science.

[19]  Daniel Berleant,et al.  Mining MEDLINE: Abstracts, Sentences, or Phrases? , 2001, Pacific Symposium on Biocomputing.

[20]  K. Kitchin,et al.  Laboratory methods for ten hepatic toxification/detoxification parameters. , 1983, Methods and findings in experimental and clinical pharmacology.

[21]  Hamilton Bp Diabetes mellitus and hypertension , 1990 .

[22]  Mark Craven,et al.  Constructing Biological Knowledge Bases by Extracting Information from Text Sources , 1999, ISMB.

[23]  Y Oyanagui,et al.  Immunosuppressants enhance superoxide radical/nitric oxide-dependent dexamethasone suppression of ischemic paw edema in mice. , 1998, European journal of pharmacology.

[24]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[25]  K Kobayashi,et al.  [Diabetes mellitus and hypertension]. , 1968, Naika. Internal medicine.

[26]  Erik Eliasson,et al.  Pharmacokinetics of losartan and its metabolite E‐3174 in relation to the CYP2C9 genotype , 2002, Clinical pharmacology and therapeutics.

[27]  K. Mikoshiba,et al.  Regulation of nerve growth mediated by inositol 1,4,5-trisphosphate receptors in growth cones. , 1998, Science.

[28]  R. V. van Leeuwen,et al.  Equally potent inhibitors of cholesterol synthesis in human hepatocytes have distinguishable effects on different cytochrome P450 enzymes , 2000, Biopharmaceutics & drug disposition.

[29]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[30]  Javed Mostafa,et al.  Detecting Gene Relations from MEDLINE Abstracts , 2000, Pacific Symposium on Biocomputing.

[31]  ROBERT GAIZAUSKAS,et al.  ARTICLES : ENZYME INTERACTIONS AND PROTEIN STRUCTURES , 2001 .

[32]  Denys Proux,et al.  A Pragmatic Information Extraction Strategy for Gathering Data on Genetic Interactions , 2000, ISMB.

[33]  J D Baxter,et al.  Mechanisms of glucocorticoid inhibition of growth. , 1978, Kidney international.

[34]  Limsoon Wong,et al.  Accomplishments and challenges in literature data mining for biology , 2002, Bioinform..

[35]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[36]  Y Oda,et al.  Cytochrome P4502B6 and 2C9 do not metabolize midazolam: kinetic analysis and inhibition study with monoclonal antibodies. , 2001, British journal of anaesthesia.

[37]  T. Kumai,et al.  Dihydropyrimidine dehydrogenase activity and fluorouracil pharmacokinetics with liver damage induced by bile duct ligation in rats. , 1999, Drug metabolism and disposition: the biological fate of chemicals.

[38]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[39]  Ng,et al.  Toward Routine Automatic Pathway Discovery from On-line Scientific Text Abstracts. , 1999, Genome informatics. Workshop on Genome Informatics.

[40]  D. Klein,et al.  Pineal N-acetyltransferase is inactivated by disulfide-containing peptides: insulin is the most potent. , 1981, Science.

[41]  Christopher G. Chute,et al.  The horizontal and vertical nature of patient phenotype retrieval: new directions for clinical text processing , 2002, AMIA.

[42]  Park,et al.  Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. , 1998, Genome informatics. Workshop on Genome Informatics.

[43]  G Demetriou,et al.  Two applications of information extraction to biological science journal articles: enzyme interactions and protein structures. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[44]  Limsoon Wong,et al.  PIES, A Protein Interaction Extraction System , 2000, Pacific Symposium on Biocomputing.

[45]  Jong C. Park,et al.  Bidirectional Incremental Parsing for Automatic Pathway Identification with Combinatory Categorial Grammar , 2000, Pacific Symposium on Biocomputing.

[46]  C. Ouzounis,et al.  Automatic extraction of protein interactions from scientific abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.