Exploring a phylogenetic approach for the detection of correlated substitutions in proteins.

The remarkable conservation of protein structure, compared with that of sequences, suggests that in the course of evolution, residue substitutions which tend to destabilize a particular structure must be compensated by other substitutions that confer greater stability on that structure. Several approaches have been designed to detect correlated changes in a set of homologous sequences. However, most of them do not take into account the phylogeny of the sequences, and it has been shown that their detection power is weak. It remains unclear whether coevolution could be a general process at the level of amino acids of proteins. In the present study, we analyze the phylogenetic reconstruction of 15 sets of homologous proteins to assess, under different conditions, whether a significant amount of coevolving sites can be detected. Two criteria are used to detect significantly cosubstituting sites. One criterion corresponds to that of Shindyalov, Kolchanov, and Sander. The second one is based on intensive simulations of evolution of protein sequences along a phylogeny to estimate the significance of the number of observed cosubstitutions for pairs of sites. Our results show an important sensitivity of the detection of cosubstituting sites to the model used for the phylogenetic reconstruction. Not considering the uncertainty associated with the reconstructed data might lead to detecting numerous false-positive pairs of sites. Finally, significant amounts of coevolving pairs could be found only when substitutions affecting the physicochemical properties of the amino acids were considered. Such results suggest evidence of a cosubstitution mechanism in protein evolution. However, the identification of nonambiguous coevolving sites is still unresolved.

[1]  C. Sander,et al.  Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[2]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[3]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[4]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[5]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[6]  E. Neher How frequent are correlated changes in families of protein sequences? , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[7]  C. Sander,et al.  Correlated mutations and residue contacts in proteins , 1994, Proteins.

[8]  Todd H. Oakley,et al.  Reconstructing ancestral character states: a critical reappraisal. , 1998, Trends in ecology & evolution.

[9]  M. Nei,et al.  A new method of inference of ancestral nucleotide and amino acid sequences. , 1995, Genetics.

[10]  W. Maddison A METHOD FOR TESTING THE CORRELATED EVOLUTION OF TWO BINARY CHARACTERS: ARE GAINS OR LOSSES CONCENTRATED ON CERTAIN BRANCHES OF A PHYLOGENETIC TREE? , 1990, Evolution; international journal of organic evolution.

[11]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[12]  G. Chelvanayagam,et al.  An analysis of simultaneous variation in protein structures. , 1997, Protein engineering.

[13]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  A. Lesk,et al.  Correlation of co-ordinated amino acid substitutions with function in viruses related to tobacco mosaic virus. , 1987, Journal of molecular biology.

[15]  A. Valencia,et al.  Improving contact predictions by the combination of correlated mutations and other sources of sequence information. , 1997, Folding & design.

[16]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..