Gene tree reconstruction and orthology analysis based on an integrated model for duplications and sequence evolution

Gene tree and species tree reconstruction, orthology analysis and reconciliation, are problems important in multigenome-based comparative genomics and biology in general. In the present paper, we advance the frontier of these areas in several respects and provide important computational tools. First, exact algorithms are given for several probabilistic reconciliation problems with respect to the probabilistic gene evolution model, previously developed by the authors. Until now, those problems were solved by MCMC estimation algorithms. Second, we extend the gene evolution model to the gene sequence evolution model, by including sequence evolution. Third, we develop MCMC algorithms for the gene sequence evolution model that, given gene sequence data allows: (1) orthology analysis, reconciliation analysis, and gene tree reconstruction, w.r.t. a species tree, that balances a likely/unlikely reconciliation and a likely/unlikely gene tree and (2) species tree reconstruction that balance a likely/unlikely reconciliation and a likely/unlikely gene trees. These MCMC algorithms take advantage of the exact algorithms for the gene evolution model. We have successfully tested our dynamical programming algorithms on real data for a biogeography problem. The MCMC algorithms perform very well both on synthetic and biological data.

[1]  D. Kendall On the Generalized "Birth-and-Death" Process , 1948 .

[2]  Cecilia Saccone,et al.  Mammalian genes as molecular clocks? , 1985, Journal of Molecular Evolution.

[3]  Virginia Held,et al.  Birth and Death , 1989, Ethics.

[4]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[5]  A. Janke,et al.  Mitogenomic analyses of eutherian relationships , 2002, Cytogenetic and Genome Research.

[6]  John P. Huelsenbeck,et al.  MRBAYES: Bayesian inference of phylogenetic trees , 2001, Bioinform..

[7]  Bengt Sennblad,et al.  Bayesian gene/species tree reconciliation and orthology analysis using MCMC , 2003, ISMB.

[8]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[9]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[10]  Anders Backlund,et al.  Nothofagus Biogeography Revisited with Special Emphasis on the Enigmatic Distribution of Subgenus Brassospora in New Caledonia , 2001 .

[11]  H. Kishino,et al.  Estimating the rate of evolution of the rate of molecular evolution. , 1998, Molecular biology and evolution.

[12]  M. Nei,et al.  Evolution by the birth-and-death process in multigene families of the vertebrate immune system. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[13]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[14]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[15]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .