论文信息 - Detecting coevolving positions in a molecule: why and how to account for phylogeny

Detecting coevolving positions in a molecule: why and how to account for phylogeny

Positions in a molecule that share a common constraint do not evolve independently, and therefore leave a signature in the patterns of homologous sequences. Exhibiting such positions with a coevolution pattern from a sequence alignment has great potential for predicting functional and structural properties of molecules through comparative analysis. This task is complicated by the existence of additional correlation sources, leading to false predictions. The nature of the data is a major source of noise correlation: sequences are taken from individuals with different degrees of relatedness, and who therefore are intrinsically correlated. This has led to several method developments in different fields that are potentially confusing for non-expert users interested in these methodologies. It also explains why coevolution detection methods are largely unemployed despite the importance of the biological questions they address. In this article, I focus on the role of shared ancestry for understanding molecular coevolution patterns. I review and classify existing coevolution detection methods according to their ability to handle shared ancestry. Using a ribosomal RNA benchmark data set, for which detailed knowledge of the structure and coevolution patterns is available, I demonstrate and explain why taking the underlying evolutionary history of sequences into account is the only way to extract the full coevolution signal in the data. I also evaluate, using rigorous statistical procedures, the best approaches to do so, and discuss several important biological aspects to consider when performing coevolution analyses.

Julien Dutheil | J. Dutheil

[1] G. Stormo,et al. Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[2] C. Sander,et al. Can three-dimensional contacts in protein structures be predicted by analysis of correlated mutations? , 1994, Protein engineering.

[3] W R Taylor,et al. Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[4] David Haussler,et al. Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[5] Itay Mayrose,et al. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[6] R Core Team,et al. R: A language and environment for statistical computing. , 2014 .

[7] W. Atchley,et al. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[8] O. Gascuel,et al. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[9] Simon A. A. Travers,et al. A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses , 2006, Genetics.

[10] P. Tuff,et al. Exploring a phylogenetic approach for the detection of correlated substitutions in proteins. , 2000, Molecular biology and evolution.

[11] Stefan M. Larson,et al. Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. , 2000, Journal of molecular biology.