Determining the evolutionary history of gene families

MOTIVATION Recent large-scale studies of individuals within a population have demonstrated that there is widespread variation in copy number in many gene families. In addition, there is increasing evidence that the variation in gene copy number can give rise to substantial phenotypic effects. In some cases, these variations have been shown to be adaptive. These observations show that a full understanding of the evolution of biological function requires an understanding of gene gain and gene loss. Accurate, robust evolutionary models of gain and loss events are, therefore, required. RESULTS We have developed weighted parsimony and maximum likelihood methods for inferring gain and loss events. To test these methods, we have used Markov models of gain and loss to simulate data with known properties. We examine three models: a simple birth-death model, a single rate model and a birth-death innovation model with parameters estimated from Drosophila genome data. We find that for all simulations maximum likelihood-based methods are very accurate for reconstructing the number of duplication events on the phylogenetic tree, and that maximum likelihood and weighted parsimony have similar accuracy for reconstructing the ancestral state. Our implementations are robust to different model parameters and provide accurate inferences of ancestral states and the number of gain and loss events. For ancestral reconstruction, we recommend weighted parsimony because it has similar accuracy to maximum likelihood, but is much faster. For inferring the number of individual gene loss or gain events, maximum likelihood is noticeably more accurate, albeit at greater computational cost. AVAILABILITY www.bioinf.manchester.ac.uk/dupliphy CONTACT simon.lovell@manchester.ac.uk; simon.whelan@manchester.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  Matthew W. Hahn,et al.  Estimating the tempo and mode of gene family evolution from comparative genomic data. , 2005, Genome research.

[2]  E. Koonin,et al.  Birth and death of protein domains: A simple model of evolution explains power law behavior , 2002, BMC Evolutionary Biology.

[3]  Dannie Durand,et al.  NOTUNG: A Program for Dating Gene Duplications and Optimizing Gene Family Trees , 2000, J. Comput. Biol..

[4]  Michael T. Hallett,et al.  Simultaneous Identification of Duplications and Lateral Gene Transfers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Matthew W. Hahn,et al.  Bias in phylogenetic tree reconciliation methods: implications for vertebrate genome evolution , 2007, Genome Biology.

[6]  Yun Ding,et al.  On the origin of new genes in Drosophila. , 2008, Genome research.

[7]  Andreas Wagner,et al.  GenomeHistory: a software tool and its application to fully sequenced genomes. , 2002, Nucleic acids research.

[8]  John S. Conery,et al.  The evolutionary demography of duplicate genes , 2004, Journal of Structural and Functional Genomics.

[9]  Leopold Parts,et al.  Population genomics of domestic and wild yeasts , 2008 .

[10]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[11]  Inna Dubchak,et al.  Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. , 2005, Genome research.

[12]  N. Carter,et al.  Germline rates of de novo meiotic deletions and duplications causing several genomic disorders , 2008, Nature Genetics.

[13]  C. Ponting,et al.  Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes. , 2007, Genome research.

[14]  Miklós Csűrös,et al.  Ancestral Reconstruction by Asymmetric Wagner Parsimony over Continuous Characters and Squared Parsimony over Distributions , 2008, RECOMB 2008.

[15]  Alan M. Moses,et al.  Widespread Discordance of Gene Trees with Species Tree in Drosophila: Evidence for Incomplete Lineage Sorting , 2006, PLoS genetics.

[16]  Joseph Felsenstein,et al.  PHYLOGENIES FROM RESTRICTION SITES: A MAXIMUM‐LIKELIHOOD APPROACH , 1992, Evolution; international journal of organic evolution.

[17]  Ira M. Hall,et al.  Recurrent DNA copy number variation in the laboratory mouse , 2007, Nature Genetics.

[18]  Fernando A. Villanea,et al.  Diet and the evolution of human amylase gene copy number variation , 2007, Nature Genetics.

[19]  J. Lupski,et al.  Genomic rearrangements and sporadic disease , 2007, Nature Genetics.

[20]  David Sankoff,et al.  Locating the vertices of a steiner tree in an arbitrary metric space , 1975, Math. Program..

[21]  J. Felsenstein Cases in which Parsimony or Compatibility Methods will be Positively Misleading , 1978 .

[22]  Melanie A. Huntley,et al.  Evolution of genes and genomes on the Drosophila phylogeny , 2007, Nature.

[23]  Robert P. Davey,et al.  Population genomics of domestic and wild yeasts , 2008, Nature.

[24]  Toshihisa Takagi,et al.  Reconstruction of highly heterogeneous gene-content evolution across the three domains of life , 2007, ISMB/ECCB.

[25]  Miklós Csuös,et al.  Count: evolutionary analysis of phylogenetic profiles with parsimony and likelihood , 2010, Bioinform..

[26]  Mira V. Han,et al.  Gene Family Evolution across 12 Drosophila Genomes , 2007, PLoS genetics.

[27]  Michael S. Barker,et al.  Probabilistic models of chromosome number evolution and the inference of polyploidy. , 2010, Systematic biology.

[28]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[29]  Peter H. Sudmant,et al.  Diversity of Human Copy Number Variation and Multicopy Genes , 2010, Science.

[30]  Philip M. Kim,et al.  The current excitement about copy-number variation: how it relates to gene duplications and protein families. , 2008, Current opinion in structural biology.

[31]  Roderic D. M. Page,et al.  GeneTree: comparing gene and species phylogenies using reconciled trees , 1998, Bioinform..

[32]  István Miklós,et al.  A Probabilistic Model for Gene Content Evolution with Duplication, Loss, and Horizontal Transfer , 2005, RECOMB.

[33]  Tal Pupko,et al.  A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families , 2002, Bioinform..

[34]  R. Shamir,et al.  A fast algorithm for joint reconstruction of ancestral amino acid sequences. , 2000, Molecular biology and evolution.

[35]  J. Farris Methods for Computing Wagner Trees , 1970 .

[36]  P. Newton,et al.  Adaptive Copy Number Evolution in Malaria Parasites , 2008, PLoS genetics.

[37]  D. Hartl,et al.  A portrait of copy-number polymorphism in Drosophila melanogaster , 2007, Proceedings of the National Academy of Sciences.

[38]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[39]  David L Robertson,et al.  All duplicates are not equal: the difference between small-scale and genome duplication , 2007, Genome Biology.

[40]  David G. Knowles,et al.  Recent de novo origin of human protein-coding genes. , 2009, Genome research.

[41]  Robert T. Schultz,et al.  Autism genome-wide copy number variation reveals ubiquitin and neuronal genes , 2009, Nature.

[42]  D. Conrad,et al.  Global variation in copy number in the human genome , 2006, Nature.

[43]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[44]  Nello Cristianini,et al.  CAFE: a computational tool for the study of gene family evolution , 2006, Bioinform..

[45]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[46]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[47]  Huifeng Jiang,et al.  De Novo Origination of a New Protein-Coding Gene in Saccharomyces cerevisiae , 2008, Genetics.

[48]  Justin O. Borevitz,et al.  Natural Selection Shapes Genome-Wide Patterns of Copy-Number Polymorphism in Drosophila melanogaster , 2008, Science.

[49]  J. Lagergren,et al.  Simultaneous Bayesian gene tree reconstruction and reconciliation analysis , 2009, Proceedings of the National Academy of Sciences.

[50]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[51]  Kathryn E. Hentges,et al.  Gene Duplication and Environmental Adaptation within Yeast Populations , 2010, Genome biology and evolution.

[52]  Ziheng Yang,et al.  Computational Molecular Evolution , 2006 .