Correlated Mutations: A Hallmark of Phenotypic Amino Acid Substitutions

Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/.

[1]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[2]  M. Fares,et al.  Functional coevolutionary networks of the Hsp70-Hop-Hsp90 system revealed through computational analyses. , 2007, Molecular biology and evolution.

[3]  C. Scriver,et al.  Expression and molecular analysis of mutations in prolidase deficiency. , 1996, American journal of human genetics.

[4]  K. Hatrick,et al.  Compensating changes in protein multiple sequence alignments. , 1994, Protein engineering.

[5]  Amos Bairoch,et al.  Swiss-Prot: Juggling between evolution and stability , 2004, Briefings Bioinform..

[6]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[7]  Leo Goodstadt,et al.  Evolutionary conservation and selection of human disease gene orthologs in the rat and mouse genomes , 2004, Genome Biology.

[8]  M. Rocco,et al.  Molecular characterisation of six patients with prolidase deficiency: identification of the first small duplication in the prolidase gene and of a mutation generating symptomatic and asymptomatic outcomes within the same family , 2006, Journal of Medical Genetics.

[9]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[10]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[11]  P Fariselli,et al.  Prediction of contact maps with neural networks and correlated mutations. , 2001, Protein engineering.

[12]  Christian von Mering,et al.  STRING 7—recent developments in the integration and prediction of protein interactions , 2006, Nucleic Acids Res..

[13]  R. Gutell,et al.  Higher order structure in ribosomal RNA. , 1986, The EMBO journal.

[14]  A. Horovitz,et al.  Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations , 2002, Proteins.

[15]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[16]  A. Valencia,et al.  Improving contact predictions by the combination of correlated mutations and other sources of sequence information. , 1997, Folding & design.

[17]  W. P. Russ,et al.  Evolutionary information for specifying a protein fold , 2005, Nature.

[18]  Stefan M. Larson,et al.  Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. , 2000, Journal of molecular biology.

[19]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[20]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[21]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[22]  F. Endo,et al.  Human Erythrocyte Prolidase and Prolidase Deficiency , 1982, Pediatric Research.

[23]  M. Edgell,et al.  Insights into correlated motions and long-range interactions in CheY derived from molecular dynamics simulations. , 2007, Biophysical journal.

[24]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[25]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[26]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[27]  K. Nagai,et al.  Coordinated amino acid changes in homologous protein families. , 1988, Protein engineering.

[28]  J. Thornton,et al.  Molecular basis of inherited diseases: a structural perspective. , 2003, Trends in genetics : TIG.

[29]  C. Ouzounis,et al.  Genome-wide identification of genes likely to be involved in human genetic disease. , 2004, Nucleic acids research.

[30]  D. Vitkup,et al.  Network properties of genes harboring inherited disease mutations , 2008, Proceedings of the National Academy of Sciences.

[31]  Analysing the origin of long-range interactions in proteins using lattice models , 2009, BMC Structural Biology.

[32]  Dmitrij Frishman,et al.  Co-evolving residues in membrane proteins , 2007, Bioinform..

[33]  Byung-chul Lee,et al.  Analysis of the residue–residue coevolution network and the functionally important residues in proteins , 2008, Proteins.

[34]  C. Ponting,et al.  Elevated rates of protein secretion, evolution, and disease among tissue-specific genes. , 2003, Genome research.

[35]  Dmitrij Frishman,et al.  Designability, aggregation propensity and duplication of disease-associated proteins. , 2005, Protein engineering, design & selection : PEDS.

[36]  A. Horovitz,et al.  Detection and reduction of evolutionary noise in correlated mutation analysis. , 2005, Protein engineering, design & selection : PEDS.

[37]  G. Vriend,et al.  Prediction of protein residue contacts with a PDB-derived likelihood matrix. , 2002, Protein engineering.

[38]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[39]  A. Eyre-Walker,et al.  Human disease genes: patterns and predictions. , 2003, Gene.

[40]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[41]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[42]  Zhengyuan O. Wang,et al.  Coevolutionary Patterns in Cytochrome c Oxidase Subunit I Depend on Structural and Functional Context , 2007, Journal of Molecular Evolution.

[43]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[44]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[45]  Yiannis Kaznessis,et al.  Prediction of distant residue contacts with the use of evolutionary information , 2005, Proteins.

[46]  Emil Alexov,et al.  Predicting residue contacts using pragmatic correlated mutations method: reducing the false positives , 2006, BMC Bioinformatics.

[47]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[48]  M. Indelman,et al.  A homozygous missense mutation in PEPD encoding peptidase D causes prolidase deficiency associated with hyper‐IgE syndrome , 2006, Clinical and experimental dermatology.

[49]  F. Endo,et al.  A single nucleotide change in the prolidase gene in fibroblasts from two patients with polypeptide positive prolidase deficiency. Expression of the mutant enzyme in NIH 3T3 cells. , 1990, The Journal of clinical investigation.

[50]  C. Sander,et al.  The amino-acid mutational spectrum of human genetic disease , 2003, Genome Biology.

[51]  W. P. Russ,et al.  Natural-like function in artificial WW domains , 2005, Nature.

[52]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[53]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[54]  B. Rost,et al.  Effective use of sequence correlation and conservation in fold recognition. , 1999, Journal of molecular biology.

[55]  F. Endo,et al.  Structural organization of the gene for human prolidase (peptidase D) and demonstration of a partial gene deletion in a patient with prolidase deficiency. , 1990, The Journal of biological chemistry.

[56]  S. Henikoff,et al.  Predicting the effects of amino acid substitutions on protein function. , 2006, Annual review of genomics and human genetics.

[57]  Rob Knight,et al.  Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics , 2008, BMC Evolutionary Biology.

[58]  G. Stormo,et al.  Correlated mutations in protein sequences: Phylogenetic and structural effects , 1997 .

[59]  See-Kiong Ng,et al.  Integrative approach for computationally inferring protein domain interactions , 2003, SAC '03.

[60]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.