Coupled mutation finder: A new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations

BackgroundThe detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations.ResultsTo demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality.ConclusionsThe CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible athttp://cmf.bioinf.med.uni-goettingen.de

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[3]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[4]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[5]  John M. Walker,et al.  Principles and Techniques of Biochemistry and Molecular Biology: Plate sections , 2005 .

[6]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[7]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[8]  Roy S Herbst,et al.  Review of epidermal growth factor receptor biology. , 2004, International journal of radiation oncology, biology, physics.

[9]  W. Atchley,et al.  Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[10]  David Haussler,et al.  Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[11]  A. Horovitz,et al.  Detection and reduction of evolutionary noise in correlated mutation analysis. , 2005, Protein engineering, design & selection : PEDS.

[12]  Ofer Yifrach,et al.  Principles underlying energetic coupling along an allosteric communication trajectory of a voltage-activated K+ channel , 2007, Proceedings of the National Academy of Sciences.

[13]  J. Moult,et al.  SNPs, protein structure, and disease , 2001, Human mutation.

[14]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[15]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[16]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[17]  Y. Cui,et al.  Functional impacts of non‐synonymous single nucleotide polymorphisms: Selective constraint and structural environments , 2006, FEBS letters.

[18]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[19]  Jun Wang,et al.  New methods to measure residues coevolution in proteins , 2011, BMC Bioinformatics.

[20]  Patricia L. Harris,et al.  Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. , 2004, The New England journal of medicine.

[21]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[22]  David F. Burke,et al.  BMC Bioinformatics BioMed Central Methodology article Genome bioinformatic analysis of nonsynonymous SNPs , 2006 .

[23]  Jan Reichert,et al.  The IMB Jena Image Library of Biological Macromolecules: 2002 update , 2002, Nucleic Acids Res..

[24]  Trent E Balius,et al.  Quantitative prediction of fold resistance for inhibitors of EGFR. , 2009, Biochemistry.

[25]  Jan Reichert,et al.  The IMB Jena Image Library of Biological Macromolecules - New Features , 2001, German Conference on Bioinformatics.

[26]  Gennady M Verkhivker,et al.  Sequence and Structure Signatures of Cancer Mutation Hotspots in Protein Kinases , 2009, PloS one.

[27]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Mario A. Fares,et al.  Why Should We Care About Molecular Coevolution? , 2008, Evolutionary bioinformatics online.

[29]  M. Permutt,et al.  Familial hyperinsulinism with apparent autosomal dominant inheritance: clinical and genetic differences from the autosomal recessive variant. , 1998, The Journal of pediatrics.

[30]  P. Bork,et al.  Towards a structural basis of human non-synonymous single nucleotide polymorphisms. , 2000, Trends in genetics : TIG.

[31]  J. Argente,et al.  Functional Characterization of MODY2 Mutations Highlights the Importance of the Fine-Tuning of Glucokinase and Its Role in Glucose Sensing , 2012, PloS one.

[32]  Daniel Rios,et al.  Ensembl 2011 , 2010, Nucleic Acids Res..

[33]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[34]  Pietro Liò,et al.  Prediction by Graph Theoretic Measures of Structural Effects in Proteins Arising from Non-Synonymous Single Nucleotide Polymorphisms , 2008, PLoS Comput. Biol..

[35]  Matthew Meyerson,et al.  Structures of lung cancer-derived EGFR mutants and inhibitor complexes: mechanism of activation and insights into differential inhibitor sensitivity. , 2007, Cancer cell.

[36]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[37]  Qiang Wang,et al.  ErbB receptors: from oncogenes to targeted cancer therapies. , 2007, The Journal of clinical investigation.

[38]  H. Sommers,et al.  Random bistochastic matrices , 2007, 0711.3345.

[39]  Jouhyun Jeon,et al.  Integration of Evolutionary Features for the Identification of Functionally Important Residues in Major Facilitator Superfamily Transporters , 2009, PLoS Comput. Biol..

[40]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Tobias Schreck,et al.  Computing and visually analyzing mutual information in molecular co-evolution , 2010, BMC Bioinformatics.

[42]  Richard W. Aldrich,et al.  A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments , 2004, Bioinform..

[43]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[44]  Dario Iafusco,et al.  Glucokinase (GCK) Mutations and Their Characterization in MODY2 Children of Southern Italy , 2012, PloS one.

[45]  Matthias Zwick,et al.  H2r: Identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments , 2008, BMC Bioinformatics.

[46]  Guoli Wang,et al.  PISCES: recent improvements to a PDB sequence culling server , 2005, Nucleic Acids Res..

[47]  H. Wolfson,et al.  A new, structurally nonredundant, diverse data set of protein–protein interfaces and its implications , 2004, Protein science : a publication of the Protein Society.

[48]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[49]  J. A. Ferreira,et al.  On the Benjamini-Hochberg method , 2006, math/0611265.

[50]  Teruyuki Nishimura,et al.  Structural basis for allosteric regulation of the monomeric allosteric enzyme human glucokinase. , 2004, Structure.

[51]  Lecture Notes,et al.  Multiple Comparisons: Bonferroni Corrections and False Discovery Rates , 2004 .

[52]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[53]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[54]  BMC Bioinformatics , 2005 .

[55]  Lucia Sacchetti,et al.  Glucokinase Gene Mutations: Structural and Genotype-Phenotype Analyses in MODY Children from South Italy , 2008, PloS one.