Overview of the BioCreative VI chemical-protein interaction Track

Despite the considerable number of available systems that recognize automatically mentions of genes/proteins and chemicals in text, only a limited number of attempts were made so far to extract interactions between them. Most biomedical relation extraction systems focus on the extraction of protein-protein or gene/chemical-disease relations. The detection of interactions between drugs and proteins/genes is of key relevance for pharmacological and clinical research, playing an important role for drug discovery, understanding of molecular mechanism of adverse drug reactions, describing drug metabolism or drawing regulatory networks of importance for systems pharmacology. The BioCreative VI ChemProt track represents the first attempt to promote the development of systems for extracting chemical-protein interactions (CPIs), of relevance for precision medicine as well as for drug discovery and basic biomedical research. The novel ChemProt corpus consists of text exhaustively annotated by hand with mentions of chemical compounds/drugs and genes/proteins, as well as 22 different types of compound-protein relations. To focus on a subset of important relations, 5 relation classes were used for evaluation purposes, including agonist, antagonist, inhibitor, activator and substrate/product relations. A total of 13 participating teams returned 45 runs for this track. Despite the biological complexity of the considered relation types, top-scoring teams could obtain an F-measure across relation classes of 64.10%. Performance varied depending on the relation class: for the antagonist relation class the best team obtained an F-measure of 72.56% (precision of 80.75%, recall of 65.87%) while for inhibition/down-regulation the best value was of 71.48% (with a precision of 76.51% and a recall of 67.07%). Keywords—text mining; chemical compound; drug; protein; drug target; agonist; antagonist, inhibitor; activator; gene regulation; chemical-protein relation

[1]  Alfonso Valencia,et al.  The Markyt visualisation, prediction and benchmark platform for chemical and gene entity recognition at BioCreative/CHEMDNER challenge , 2016, Database J. Biol. Databases Curation.

[2]  Joanna L. Sharman,et al.  The IUPHAR/BPS Guide to PHARMACOLOGY in 2016: towards curated quantitative interactions between 1300 protein targets and 6000 ligands , 2015, Nucleic Acids Res..

[3]  Feng Xu,et al.  Therapeutic target database update 2016: enriched resource for bench to clinical drug target and targeted pathway information , 2015, Nucleic Acids Res..

[4]  Thomas C. Rindflesch,et al.  EDGAR: extraction of drugs, genes and relations from the biomedical literature. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[5]  Zhiyong Lu,et al.  The CHEMDNER corpus of chemicals and drugs and its annotation principles , 2015, Journal of Cheminformatics.

[6]  Chitta Baral,et al.  Pacific Symposium on Biocomputing 14:87-98 (2009) QUERYING PARSE TREE DATABASE OF MEDLINE TEXT TO SYNTHESIZE USER-SPECIFIC BIOMOLECULAR NETWORKS , 2022 .

[7]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[8]  Alfonso Valencia,et al.  Evaluation of chemical and gene/protein entity recognition systems at BioCreative V.5: the CEMP and GPRO patents tracks , 2017 .

[9]  Adrian J. Shepherd,et al.  A text-mining system for extracting metabolic reactions from full-text articles , 2012, BMC Bioinformatics.

[10]  Ubbo Visser,et al.  BioAssay Ontology (BAO): a semantic description of bioassays and high-throughput screening results , 2011, BMC Bioinformatics.

[11]  Gael Pérez Rodríguez,et al.  Overview of the CHEMDNER patents task , 2015 .

[12]  Yifan Peng,et al.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task , 2016, Database J. Biol. Databases Curation.

[13]  A. Valencia,et al.  Information Retrieval and Text Mining Technologies for Chemistry. , 2017, Chemical reviews.

[14]  Robert B. Russell,et al.  SuperTarget and Matador: resources for exploring drug-target relationships , 2007, Nucleic Acids Res..

[15]  Martin Krallinger,et al.  LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes , 2017, Nucleic Acids Res..

[16]  L. Farrell Relationships , 2002, BMJ : British Medical Journal.

[17]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[18]  Martin Krallinger,et al.  6.04 – Text Mining , 2014 .

[19]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[20]  Livia Perfetto,et al.  SIGNOR: a database of causal relationships between biological entities , 2015, Nucleic Acids Res..