On calculating the probability of a set of orthologous sequences

Probabilistic DNA sequence models have been intensively applied to genome research. Within the evolutionary biology framework, this article investigates the feasibility for rigorously estimating the probability of a set of orthologous DNA sequences which evolve from a common progenitor. We propose Monte Carlo integration algorithms to sample the unknown ancestral and/or root sequences a posteriori conditional on a reference sequence and apply pairwise Needleman–Wunsch alignment between the sampled and nonreference species sequences to estimate the probability. We test our algorithms on both simulated and real sequences and compare calculated probabilities from Monte Carlo integration to those induced by single multiple alignment.

[1]  Enrique Blanco,et al.  ABS: a database of Annotated regulatory Binding Sites from orthologous promoters , 2005, Nucleic Acids Res..

[2]  S. Jeffery Evolution of Protein Molecules , 1979 .

[3]  M. Suchard,et al.  Joint Bayesian estimation of alignment and phylogeny. , 2005, Systematic biology.

[4]  Zhi Wei,et al.  GAME: detecting cis-regulatory elements using a genetic algorithm , 2006, Bioinform..

[5]  Xin He,et al.  MORPH: Probabilistic Alignment Combined with Hidden Markov Models of cis-Regulatory Modules , 2007, PLoS Comput. Biol..

[6]  Ker-Chau Li,et al.  A Bayesian Insertion/Deletion Algorithm for Distant Protein Motif Searching via Entropy Filtering , 2004 .

[7]  M. Nei,et al.  Molecular Evolutionary Genetics Analysis , 2007 .

[8]  References , 1971 .

[9]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[10]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[11]  M. Suchard,et al.  Alignment Uncertainty and Genomic Analysis , 2008, Science.

[12]  Michael P. Cummings,et al.  MEGA (Molecular Evolutionary Genetics Analysis) , 2004 .

[13]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[14]  Michael B. Eisen,et al.  Phylogenetic Motif Detection by Expectation-Maximization on Evolutionary Mixtures , 2003, Pacific Symposium on Biocomputing.

[15]  P. Wagner,et al.  Integrating ambiguously aligned regions of DNA sequences in phylogenetic analyses without violating positional homology. , 2000, Systematic biology.

[16]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[17]  Elena Rivas,et al.  Evolutionary models for insertions and deletions in a probabilistic modeling framework , 2005, BMC Bioinformatics.

[18]  Jun S. Liu,et al.  Markovian structures in biological sequence alignments , 1999 .

[19]  M. Nei,et al.  MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. , 2007, Molecular biology and evolution.

[20]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .