Evolutionary fingerprinting of genes.

Over time, natural selection molds every gene into a unique mosaic of sites evolving rapidly or resisting change-an "evolutionary fingerprint" of the gene. Aspects of this evolutionary fingerprint, such as the site-specific ratio of nonsynonymous to synonymous substitution rates (dN/dS), are commonly used to identify genetic features of potential biological interest; however, no framework exists for comparing evolutionary fingerprints between genes. We hypothesize that protein-coding genes with similar protein structure and/or function tend to have similar evolutionary fingerprints and that comparing evolutionary fingerprints can be useful for discovering similarities between genes in a way that is analogous to, but independent of, discovery of similarity via sequence-based comparison tools such as Blast. To test this hypothesis, we develop a novel model of coding sequence evolution that uses a general bivariate discrete parameterization of the evolutionary rates. We show that this approach provides a better fit to the data using a smaller number of parameters than existing models. Next, we use the model to represent evolutionary fingerprints as probability distributions and present a methodology for comparing these distributions in a way that is robust against variations in data set size and divergence. Finally, using sequences of three rapidly evolving RNA viruses (HIV-1, hepatitis C virus, and influenza A virus), we demonstrate that genes within the same functional group tend to have similar evolutionary fingerprints. Our framework provides a sound statistical foundation for efficient inference and comparison of evolutionary rate patterns in arbitrary collections of gene alignments, clustering homologous and nonhomologous genes, and investigation of biological and functional correlates of evolutionary rates.

[1]  Konrad Scheffler,et al.  Robust inference of positive selection from recombining coding sequences , 2006, Bioinform..

[2]  R. Nielsen,et al.  Pervasive adaptive evolution in mammalian fertilization proteins. , 2003, Molecular biology and evolution.

[3]  Ziheng Yang,et al.  Codon-substitution models to detect adaptive evolution that account for heterogeneous selective pressures among site classes. , 2002, Molecular biology and evolution.

[4]  Sergei L. Kosakovsky Pond,et al.  Not so different after all: a comparison of methods for detecting amino acid sites under selection. , 2005, Molecular biology and evolution.

[5]  Donald B. Rubin,et al.  Comment : A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest : The SIR Algorithm , 1987 .

[6]  Sergei L. Kosakovsky Pond,et al.  A genetic algorithm approach to detecting lineage-specific variation in selection pressure. , 2005, Molecular biology and evolution.

[7]  W. Wong,et al.  Bayes empirical bayes inference of amino acid sites under positive selection. , 2005, Molecular biology and evolution.

[8]  M. Conrad,et al.  Evidence that natural selection acts on silent mutation. , 1983, Bio Systems.

[9]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[10]  R K Craig,et al.  Methods in molecular medicine. , 1987, British medical journal.

[11]  A. Tuplin,et al.  Detailed mapping of RNA secondary structures in core and NS5B-encoding region sequences of hepatitis C virus by RNase cleavage and novel bioinformatic prediction methods. , 2004, The Journal of general virology.

[12]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[13]  W. Marzluff,et al.  Selection on silent sites in the rodent H3 histone gene family. , 1994, Genetics.

[14]  M. Emerman,et al.  Ancient Adaptive Evolution of the Primate Antiviral DNA-Editing Enzyme APOBEC3G , 2004, PLoS biology.

[15]  R. Nielsen,et al.  Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. , 2003, Genetics.

[16]  W. Swanson Adaptive evolution of genes and gene families. , 2003, Current opinion in genetics & development.

[17]  Joaquín Dopazo,et al.  Positive Selection, Relaxation, and Acceleration in the Evolution of the Human and Chimp Genome , 2006, PLoS Comput. Biol..

[18]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[19]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[20]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[21]  A. Tuplin,et al.  Thermodynamic and phylogenetic prediction of RNA secondary structures in the coding region of hepatitis C virus. , 2002, RNA.

[22]  L. Ambrosio Lecture Notes on Optimal Transport Problems , 2003 .

[23]  Sergei L Kosakovsky Pond Modeling evolution of protein coding DNA sequences , 2003 .

[24]  J. Margolick,et al.  Consistent Viral Evolutionary Changes Associated with the Progression of Human Immunodeficiency Virus Type 1 Infection , 1999, Journal of Virology.

[25]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[26]  Bernhard Schölkopf,et al.  Kernel Principal Component Analysis , 1997, ICANN.

[27]  Sergei L. Kosakovsky Pond,et al.  Codon volatility does not reflect selective pressure on the HIV-1 genome. , 2005, Virology.

[28]  L. Hurst,et al.  Hearing silence: non-neutral evolution at synonymous sites in mammals , 2006, Nature Reviews Genetics.

[29]  Sergei L. Kosakovsky Pond,et al.  Evolutionary model selection with a genetic algorithm: a case study using stem RNA. , 2007, Molecular biology and evolution.

[30]  P. Simmonds,et al.  Hepatitis C virus : types, subtypes, and beyond. , 1995, Methods in molecular medicine.

[31]  D. Rubin,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[32]  Timothy B Sackton,et al.  A Scan for Positively Selected Genes in the Genomes of Humans and Chimpanzees , 2005, PLoS biology.

[33]  L. Kantorovich On the Translocation of Masses , 2006 .

[34]  David Posada,et al.  Automated phylogenetic detection of recombination using a genetic algorithm. , 2006, Molecular biology and evolution.

[35]  James I Mullins,et al.  Potential impact of recombination on sitewise approaches for detecting positive natural selection. , 2003, Genetical research.

[36]  John P Huelsenbeck,et al.  A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[37]  C. Pál,et al.  Evidence for purifying selection acting on silent sites in BRCA1. , 2001, Trends in genetics : TIG.

[38]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[39]  T. Ota,et al.  Positive selection is a general phenomenon in the evolution of abalone sperm lysin. , 1995, Molecular biology and evolution.

[40]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[41]  David L. Robertson,et al.  Comparative Study of Adaptive Molecular Evolution in Different Human Immunodeficiency Virus Groups and Subtypes , 2004, Journal of Virology.

[42]  J. Heckman,et al.  A Method for Minimizing the Impact of Distributional Assumptions in Econometric Models for Duration Data , 1984 .

[43]  T. Tatusova,et al.  The Influenza Virus Resource at the National Center for Biotechnology Information , 2007, Journal of Virology.

[44]  S. Muse,et al.  Site-to-site variation of synonymous substitution rates. , 2005, Molecular biology and evolution.

[45]  Sergei L. Kosakovsky Pond,et al.  Datamonkey: rapid detection of selective pressure on individual sites of codon alignments , 2005, Bioinform..

[46]  Sergei L. Kosakovsky Pond,et al.  Adaptation to Different Human Populations by HIV-1 Revealed by Codon-Based Analyses , 2006, PLoS Comput. Biol..

[47]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.