Algorithms and semantic infrastructure for mutation impact extraction and grounding

BackgroundMutation impact extraction is a hitherto unaccomplished task in state of the art mutation extraction systems. Protein mutations and their impacts on protein properties are hidden in scientific literature, making them poorly accessible for protein engineers and inaccessible for phenotype-prediction systems that currently depend on manually curated genomic variation databases.ResultsWe present the first rule-based approach for the extraction of mutation impacts on protein properties, categorizing their directionality as positive, negative or neutral. Furthermore protein and mutation mentions are grounded to their respective UniProtKB IDs and selected protein properties, namely protein functions to concepts found in the Gene Ontology. The extracted entities are populated to an OWL-DL Mutation Impact ontology facilitating complex querying for mutation impacts using SPARQL. We illustrate retrieval of proteins and mutant sequences for a given direction of impact on specific protein properties. Moreover we provide programmatic access to the data through semantic web services using the SADI (Semantic Automated Discovery and Integration) framework.ConclusionWe address the problem of access to legacy mutation data in unstructured form through the creation of novel mutation impact extraction methods which are evaluated on a corpus of full-text articles on haloalkane dehalogenases, tagged by domain experts. Our approaches show state of the art levels of precision and recall for Mutation Grounding and respectable level of precision but lower recall for the task of Mutant-Impact relation extraction. The system is deployed using text mining and semantic web technologies with the goal of publishing to a broad spectrum of consumers.

[1]  K. Nishikawa,et al.  Constructing a protein mutant database. , 1994, Protein engineering.

[2]  F. Pries,et al.  Activation of an Asp‐124→Asn mutant of haloalkane dehalogenase by hydrolytic deamidation of asparagine , 1995, FEBS letters.

[3]  C. Kennes,et al.  Replacement of tryptophan residues in haloalkane dehalogenase reduces halide binding and catalytic activity. , 1995, European journal of biochemistry.

[4]  J. Koča,et al.  Repositioning the catalytic triad aspartic acid of haloalkane dehalogenase: effects on stability, kinetics, and structure. , 1997, Biochemistry.

[5]  E. Lau,et al.  The importance of reactant positioning in enzyme catalysis: a hybrid quantum mechanics/molecular mechanics study of a haloalkane dehalogenase. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[6]  I Lundström,et al.  Subtle differences in dissociation rates of interactions between destabilized human carbonic anhydrase II mutants and immobilized benzenesulfonamide inhibitors probed by a surface plasmon resonance biosensor. , 2001, Analytical biochemistry.

[7]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[8]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[9]  Fred E. Cohen,et al.  Automated extraction of mutation data from the literature: application of MuteXt to G protein-coupled receptors and nuclear hormone receptors , 2004, Bioinform..

[10]  G. Casari,et al.  Automatic extraction of mutations from Medline and cross-validation with OMIM. , 2004, Nucleic acids research.

[11]  Ourania Horaitis,et al.  The challenge of documenting mutation across the genome: The human genome variation society approach , 2004, Human mutation.

[12]  René Witte,et al.  Mutation Mining—A Prospector's Tale , 2006, Inf. Syst. Frontiers.

[13]  K. Bretonnel Cohen,et al.  MutationFinder: a high-performance system for extracting point mutation mentions from text , 2007, Bioinform..

[14]  Fred E. Cohen,et al.  Automatic Extraction of Protein Point Mutations Using a Graph Bigram Association , 2007, PLoS Comput. Biol..

[15]  B. Rost,et al.  SNAP: predict effect of non-synonymous polymorphisms on function , 2007, Nucleic acids research.

[16]  Kanagasabai Rajaraman,et al.  A Workflow for Mutation Extraction and Structure Annotation , 2007, J. Bioinform. Comput. Biol..

[17]  René Witte,et al.  Towards a Systematic Evaluation of protein Mutation Extraction Systems , 2007, J. Bioinform. Comput. Biol..

[18]  René Witte,et al.  Enhanced semantic access to the protein engineering literature using ontologies populated by text mining , 2007, Int. J. Bioinform. Res. Appl..

[19]  Osman Ugur Sezerman,et al.  Application of Automatic Mutation-gene Pair Extraction to Diseases , 2007, J. Bioinform. Comput. Biol..

[20]  K. Bretonnel Cohen,et al.  Intrinsic Evaluation of Text Mining Tools May Not Predict Performance on Realistic Tasks , 2007, Pacific Symposium on Biocomputing.

[21]  S A Forbes,et al.  The Catalogue of Somatic Mutations in Cancer (COSMIC) , 2008, Current protocols in human genetics.

[22]  Mark D. Wilkinson,et al.  SHARE: A Semantic Web Query Engine for Bioinformatics , 2009, ASWC.

[23]  Improved mutation tagging with gene identifiers applied to membrane protein stability prediction , 2009, BMC Bioinformatics.

[24]  Osman Ugur Sezerman,et al.  EnzyMiner: automatic identification of protein level mutations and their impact on target enzymes from PubMed abstracts , 2009, BMC Bioinformatics.

[25]  Alfonso Valencia,et al.  Extraction of human kinase mutations from literature, databases and genotyping studies , 2009, BMC Bioinformatics.

[26]  Laura Inés Furlong,et al.  From SNPs to pathways: integration of functional effect of sequence variations on models of cell signalling pathways , 2009, BMC Bioinformatics.

[27]  Mark D. Wilkinson,et al.  SADI Semantic Web Services - ‚cause you can't always GET what you want! , 2009, 2009 IEEE Asia-Pacific Services Computing Conference (APSCC).

[28]  Kanagasabai Rajaraman,et al.  Algorithm for Grounding Mutation Mentions from Text to Protein Sequences , 2010, DILS.