Automated extraction and semantic analysis of mutation impacts from the biomedical literature

BackgroundMutations as sources of evolution have long been the focus of attention in the biomedical literature. Accessing the mutational information and their impacts on protein properties facilitates research in various domains, such as enzymology and pharmacology. However, manually curating the rich and fast growing repository of biomedical literature is expensive and time-consuming. As a solution, text mining approaches have increasingly been deployed in the biomedical domain. While the detection of single-point mutations is well covered by existing systems, challenges still exist in grounding impacts to their respective mutations and recognizing the affected protein properties, in particular kinetic and stability properties together with physical quantities.ResultsWe present an ontology model for mutation impacts, together with a comprehensive text mining system for extracting and analysing mutation impact information from full-text articles. Organisms, as sources of proteins, are extracted to help disambiguation of genes and proteins. Our system then detects mutation series to correctly ground detected impacts using novel heuristics. It also extracts the affected protein properties, in particular kinetic and stability properties, as well as the magnitude of the effects and validates these relations against the domain ontology. The output of our system can be provided in various formats, in particular by populating an OWL-DL ontology, which can then be queried to provide structured information. The performance of the system is evaluated on our manually annotated corpora. In the impact detection task, our system achieves a precision of 70.4%-71.1%, a recall of 71.3%-71.5%, and grounds the detected impacts with an accuracy of 76.5%-77%. The developed system, including resources, evaluation data and end-user and developer documentation is freely available under an open source license at http://www.semanticsoftware.info/open-mutation-miner.ConclusionWe present Open Mutation Miner (OMM), the first comprehensive, fully open-source approach to automatically extract impacts and related relevant information from the biomedical literature. We assessed the performance of our work on manually annotated corpora and the results show the reliability of our approach. The representation of the extracted information into a structured format facilitates knowledge management and aids in database curation and correction. Furthermore, access to the analysis results is provided through multiple interfaces, including web services for automated data integration and desktop-based solutions for end user interactions.

[1]  G GordilloPaniagua,et al.  The international system of units , 1964 .

[2]  M. Pulido,et al.  [The International System of Units]. , 1990, Boletin de la Oficina Sanitaria Panamericana. Pan American Sanitary Bureau.

[3]  D. Kluepfel,et al.  Increase in catalytic activity and thermostability of the xylanase A of Streptomyces lividans 1326 by site-specific mutagenesis. , 1994, Enzyme and microbial technology.

[4]  M. Rossi,et al.  Asn249Tyr substitution at the coenzyme binding domain activates Sulfolobus solfataricus alcohol dehydrogenase and increases its thermal stability. , 1999, Biochemistry.

[5]  Carole A. Goble,et al.  Ontology-based Knowledge Representation for Bioinformatics , 2000, Briefings Bioinform..

[6]  H. Cunningham,et al.  A framework and graphical development environment for robust NLP tools and applications. , 2002, ACL 2002.

[7]  L. Mazzarella,et al.  Structural study of a single‐point mutant of Sulfolobus solfataricus alcohol dehydrogenase with enhanced activity , 2003, FEBS letters.

[8]  O. El-Kabbani,et al.  Identification of amino acid residues involved in substrate recognition of L-xylulose reductase by site-directed mutagenesis. , 2003, Chemico-biological interactions.

[9]  K. Sode,et al.  Stabilization of quaternary structure of water-soluble quinoprotein glucose dehydrogenase , 2003, Molecular biotechnology.

[10]  H. Tai,et al.  Critical residues for the coenzyme specificity of NAD+-dependent 15-hydroxyprostaglandin dehydrogenase. , 2003, Archives of biochemistry and biophysics.

[11]  Koji Sode,et al.  Increasing stability of water-soluble PQQ glucose dehydrogenase by increasing hydrophobic interaction at dimeric interface. , 2005, BMC biochemistry.

[12]  Fred E. Cohen,et al.  Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors , 2004, Bioinform..

[13]  O. El-Kabbani,et al.  Crystal structure of human L‐xylulose reductase holoenzyme: Probing the role of Asn107 with site‐directed mutagenesis , 2004, Proteins.

[14]  G. Casari,et al.  Automatic extraction of mutations from Medline and cross-validation with OMIM. , 2004, Nucleic acids research.

[15]  Kei-Hoi Cheung,et al.  Semantic Web: Revolutionizing Knowledge Discovery in the Life Sciences , 2006 .

[16]  Philipp Cimiano,et al.  Ontology learning and population from text - algorithms, evaluation and applications , 2006 .

[17]  René Witte,et al.  Ontology Design for Biomedical Text Mining , 2007 .

[18]  K. Bretonnel Cohen,et al.  MutationFinder: a high-performance system for extracting point mutation mentions from text , 2007, Bioinform..

[19]  Fred E. Cohen,et al.  Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association , 2007, PLoS Comput. Biol..

[20]  Kanagasabai Rajaraman,et al.  A Workflow for Mutation Extraction and Structure Annotation , 2007, J. Bioinform. Comput. Biol..

[21]  René Witte,et al.  Towards a Systematic Evaluation of protein Mutation Extraction Systems , 2007, J. Bioinform. Comput. Biol..

[22]  Yum Lina Yip,et al.  Retrieving Mutation-Specific Information for Human proteins in UniProt/Swiss-PROT knowledgebase , 2007, J. Bioinform. Comput. Biol..

[23]  Osman Ugur Sezerman,et al.  Application of Automatic Mutation-gene Pair Extraction to Diseases , 2007, J. Bioinform. Comput. Biol..

[24]  Osman Ugur Sezerman,et al.  EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts , 2009, BMC Bioinformatics.

[25]  Dietmar Schomburg,et al.  KID - an algorithm for fast and efficient text mining used to automatically generate a database containing kinetic information of enzymes , 2009, BMC Bioinformatics.

[26]  Jeff Z. Pan,et al.  Resource Description Framework , 2020, Definitions.

[27]  Norman W. Paton,et al.  KiPar, a tool for systematic information retrieval regarding parameters for kinetic modelling of yeast metabolic pathways , 2009, Bioinform..

[28]  Laura Inés Furlong,et al.  From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways , 2009, BMC Bioinformatics.

[29]  A. Kouznetsov,et al.  Algorithms and semantic infrastructure for mutation impact extraction and grounding , 2010, BMC Genomics.

[30]  René Witte,et al.  Ontology-Based Extraction and Summarization of Protein Mutation Impact Information , 2010, BioNLP@ACL.

[31]  René Witte,et al.  Flexible Ontology Population from Text: The OwlExporter , 2010, LREC.

[32]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[33]  René Witte,et al.  OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents , 2011, Bioinform..

[34]  Kalina Bontcheva,et al.  Text Processing with GATE , 2011 .

[35]  Nona Naderi,et al.  Mutation Impact Analysis System: Automated Extraction of Protein Mutation Impacts from the Biomedical Literature , 2011, CSSE 2011.

[36]  Catherine Dolbear,et al.  The Resource Description Framework , 2013 .