Characterizing positive and negative selection and their phylogenetic effects.

Total evidence and the use of large datasets to overcome uncertainty are the state of the art in systematic analysis. This assumes that the only true phylogenetic signal is ancestry and that functional, structural, and other factors will not add an alternative signal. Using gene families, where individual codon positions were sorted into bins based upon average-pairwise dN/dS ratio, we show that standard, common phylogenetic methods that were designed for stochastic, neutral, site-independent processes, generate less robust phylogenetic signal for bins with strong negative or positive selection. This was true for phylogenetic reconstruction with parsimony, distance, and likelihood methods. Further, we present a case for the potential existence of systematic functional or structural signal that competes with ancestral signal. For the example of positive selection, we simulate the evolution of sequences through three dimensional lattice constructs with folding constraint and changing binding functionality and show that total evidence for these lattice genes presents trees with functional signal, but that the neutral synonymous sites in these genes show the true ancestral signal. In this case, sequence convergence is promoted by functional convergence.

[1]  M. DePristo,et al.  Missense meanderings in sequence space: a biophysical view of protein evolution , 2005, Nature Reviews Genetics.

[2]  J. McInerney On the desirability of models for inferring genome phylogenies. , 2006, Trends in microbiology.

[3]  Adi Stern,et al.  An evolutionary space-time model with varying among-site dependencies. , 2006, Molecular biology and evolution.

[4]  J. Raes,et al.  Modeling gene and genome duplications in eukaryotes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  N. Galtier,et al.  Maximum-likelihood phylogenetic analysis under a covarion-like model. , 2001, Molecular biology and evolution.

[6]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[7]  Kevin P. Byrne,et al.  Independent sorting-out of thousands of duplicated gene pairs in two yeast species descended from a whole-genome duplication , 2007, Proceedings of the National Academy of Sciences.

[8]  David A Liberles,et al.  Using evolutionary information and ancestral sequences to understand the sequence-function relationship in GLP-1 agonists. , 2006, Journal of molecular biology.

[9]  D. Liberles,et al.  A systematic search for positive selection in higher plants (Embryophytes) , 2006, BMC Plant Biology.

[10]  Hervé Philippe,et al.  BMC Bioinformatics BioMed Central Methodology article A maximum likelihood framework for protein design , 2006 .

[11]  M. Whiting,et al.  Characterization of the long-wavelength opsin from Mecoptera and Siphonaptera: does a flea see? , 2005, Molecular biology and evolution.

[12]  N. Reuter,et al.  Evaluation of models for the evolution of protein sequences and functions under structural constraint. , 2006, Biophysical chemistry.

[13]  Hervé Philippe,et al.  Early–branching or fast–evolving eukaryotes? An answer based on slowly evolving positions , 2000, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[14]  H. Philippe,et al.  Assessing site-interdependent phylogenetic models of sequence evolution. , 2006, Molecular biology and evolution.

[15]  Matthew J. Betts,et al.  Optimal Gene Trees from Sequences and Species Trees Using a Soft Interpretation of Parsimony , 2006, Journal of Molecular Evolution.

[16]  D. Liberles,et al.  Subfunctionalization of duplicated genes as a transition state to neofunctionalization , 2005, BMC Evolutionary Biology.

[17]  S. Carroll,et al.  Genome-scale approaches to resolving incongruence in molecular phylogenies , 2003, Nature.

[18]  Valentin Ruano-Rubio,et al.  Artifactual phylogenies caused by correlated distribution of substitution rates among sites and lineages: the good, the bad, and the ugly. , 2007, Systematic biology.

[19]  David A Liberles,et al.  The Adaptive Evolution Database (TAED) , 2001, Genome Biology.

[20]  T Gojobori,et al.  A method for detecting positive selection at single amino acid sites. , 1999, Molecular biology and evolution.

[21]  D. M. Taverna,et al.  Why are proteins marginally stable? , 2002, Proteins.

[22]  J. Echave,et al.  Structural constraints and emergence of sequence patterns in protein evolution. , 2001, Molecular biology and evolution.

[23]  Mário C. C. Pinna CONCEPTS AND TESTS OF HOMOLOGY IN THE CLADISTIC PARADIGM , 1991 .

[24]  P D Williams,et al.  Evolution of functionality in lattice proteins. , 2001, Journal of molecular graphics & modelling.

[25]  Cyrus Chothia,et al.  The selection of acceptable protein mutations , 2007, Proceedings of the National Academy of Sciences.

[26]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[27]  R. Jernigan,et al.  Estimation of effective interresidue contact energies from protein crystal structures: quasi-chemical approximation , 1985 .

[28]  Nigel F. Delaney,et al.  Darwinian Evolution Can Follow Only Very Few Mutational Paths to Fitter Proteins , 2006, Science.

[29]  Jeffrey P Townsend,et al.  Profiling phylogenetic informativeness. , 2007, Systematic biology.

[30]  Matthew J. Betts,et al.  The Adaptive Evolution Database (TAED): a phylogeny based tool for comparative genomics , 2004, Nucleic Acids Res..