Modeling Dependence in Evolutionary Inference for Proteins

Protein structure alignment is a classic problem of computational biology, and is widely used to identify structural and functional similarity and to infer homology among proteins. Previously a statistical model for protein structural evolution has been introduced and shown to significantly improve phylogenetic inferences compared to approaches that utilize only amino acid sequence information. Here we extend this model to account for correlated evolutionary drift among neighboring amino acid positions, resulting in a spatio-temporal model of protein structure evolution. The result is a multivariate diffusion process convolved with a spatial birth-death process, which comes with little additional computational cost or analytical complexity compared to the site-independent model (SIM). We demonstrate that this extended, site-dependent model (SDM) yields a significant reduction of bias in estimated evolutionary distances and helps further improve phylogenetic tree reconstruction.

[1]  Manfred J. Sippl,et al.  Biological sequence analysis. Probabilistic models of proteins and nucleic acids, edited by R. Durbin, S. Eddy, A. Krogh, and G. Mitchinson. 1998. Cambridge: Cambridge University Press. 356 pp. £55.00 ($80.00) (hardcover); £19.95 ($34.95) (paper) , 2008 .

[2]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[3]  M. Schoniger,et al.  Evolution of DNA or Amino Acid Sequences with Dependent Sites , 1998, J. Comput. Biol..

[4]  N. Grishin,et al.  MALIDUP: A database of manually constructed structure alignments for duplicated domain pairs , 2007, Proteins.

[5]  M. Arenas Trends in substitution models of molecular evolution , 2015, Front. Genet..

[6]  N. Goldman,et al.  Different versions of the Dayhoff rate matrix. , 2005, Molecular biology and evolution.

[7]  Jianzhu Ma,et al.  Protein structure alignment beyond spatial proximity , 2013, Scientific Reports.

[8]  Scott C Schmidler,et al.  A stochastic evolutionary model for protein structure alignment and phylogeny. , 2012, Molecular biology and evolution.

[9]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[10]  Rui Wang,et al.  Bayesian Multiple Protein Structure Alignment , 2014, RECOMB.

[11]  Jotun Hein,et al.  Simultaneous Bayesian estimation of alignment and phylogeny under a joint model of protein sequence and structure. , 2014, Molecular biology and evolution.