GenomeHistory: a software tool and its application to fully sequenced genomes.

We present a publicly available software tool (http://www.unm.edu/~compbio/software/GenomeHistory) that identifies all pairs of duplicate genes in a genome and then determines the degree of synonymous and non-synonymous divergence between each duplicate pair. Using this tool, we analyze the relations between (i) gene function and the propensity of a gene to duplicate and (ii) the number of genes in a gene family and the family's rate of sequence evolution. We do so for the complete genomes of four eukaryotes (fission and budding yeast, fruit fly and nematode) and one prokaryote (Escherichia coli). For some classes of genes we observe a strong relationship between gene function and a gene's propensity to undergo duplication. Most notably, ribosomal genes and transcription factors appear less likely to undergo gene duplication than other genes. In both fission and budding yeast, we see a strong positive correlation between the selective constraint on a gene and the size of the gene family of which this gene is a member. In contrast, a weakly negative such correlation is seen in multicellular eukaryotes.

[1]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[2]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[3]  M. Huynen,et al.  The frequency distribution of gene family sizes in complete genomes. , 1998, Molecular biology and evolution.

[4]  M. R. Adams,et al.  Comparative genomics of the eukaryotes. , 2000, Science.

[5]  G Muthukumar,et al.  Seripauperins of Saccharomyces cerevisiae: a new multigene family encoding serine-poor relatives of serine-rich proteins. , 1994, Gene.

[6]  S. W. Emmons,et al.  Molecular characterization of the histone gene family of Caenorhabditis elegans. , 1987, Journal of molecular biology.

[7]  Andrew Smith Genome sequence of the nematode C-elegans: A platform for investigating biology , 1998 .

[8]  K. Kuma,et al.  Evolution of gene families and relationship with organismal evolution: rapid divergence of tissue-specific genes in the early evolution of chordates. , 1996, Molecular biology and evolution.

[9]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[10]  T. Roberts,et al.  Acting like Actin: The Dynamics of the Nematode Major Sperm Protein (Msp) Cytoskeleton Indicate a Push-Pull Mechanism for Amoeboid Cell Motility , 2000 .

[11]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[12]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[13]  Ronald W. Davis,et al.  Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. , 1999, Science.

[14]  R. Caprioli,et al.  A Sperm Cytoskeletal Protein That Signals Oocyte Meiotic Maturation and Ovulation , 2001, Science.

[15]  A. Wagner,et al.  The role of population size, pleiotropy and fitness effects of mutations in the evolution of overlapping gene functions. , 2000, Genetics.

[16]  W. Li,et al.  Rate of gene silencing at duplicate loci: a theoretical study and interpretation of data from tetraploid fishes. , 1980, Genetics.

[17]  William H. Press,et al.  Numerical recipes in C , 2002 .

[18]  D. Sankoff,et al.  Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. , 1997, Genetics.

[19]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[20]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[21]  Masatoshi Nei,et al.  Probability of Fixation of Nonfunctional Genes at Duplicate Loci , 1973, The American Naturalist.

[22]  C. Brown,et al.  Multiple duplications of yeast hexose transport genes in response to selection in a glucose-limited environment. , 1998, Molecular biology and evolution.

[23]  Ziheng Yang Maximum Likelihood Estimation on Large Phylogenies and Analysis of Adaptive Evolution in Human Influenza Virus A , 2000, Journal of Molecular Evolution.

[24]  A. Wagner Robustness against mutations in genetic networks of yeast , 2000, Nature Genetics.

[25]  L. Lundin,et al.  Gene duplications in early metazoan evolution. , 1999, Seminars in cell & developmental biology.

[26]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[27]  M. Gerstein,et al.  Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome. , 2001, Nucleic acids research.

[28]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[29]  W. H. Mager,et al.  The list of cytoplasmic ribosomal proteins of Saccharomyces cerevisiae , 1998, Yeast.

[30]  Mark Gerstein,et al.  Protein fold and family occurrence in genomes : power-law behaviour and evolutionary model Running title : Power-law behaviour and evolutionary model , 2001 .

[31]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[32]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[33]  K. H. Wolfe,et al.  Molecular evidence for an ancient duplication of the entire yeast genome , 1997, Nature.

[34]  B. Barrell,et al.  The genome sequence of Schizosaccharomyces pombe , 2002, Nature.

[35]  M. Klass,et al.  Isolation and characterization of a sperm-specific gene family in the nematode Caenorhabditis elegans , 1984, Molecular and cellular biology.

[36]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[37]  E. Koonin,et al.  Selection in the evolution of gene duplications , 2002, Genome Biology.

[38]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[39]  M Gerstein,et al.  A structural census of genomes: comparing bacterial, eukaryotic, and archaeal genomes in terms of protein structure. , 1997, Journal of molecular biology.

[40]  K. H. Wolfe,et al.  Extent of genomic rearrangement after genome duplication in yeast. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[41]  K. Kuma,et al.  Extensive Gene Duplication in the Early Evolution of Animals Before the Parazoan–Eumetazoan Split Demonstrated by G Proteins and Protein Tyrosine Kinases from Sponge and Hydra , 1999, Journal of Molecular Evolution.

[42]  Dr. Susumu Ohno Evolution by Gene Duplication , 1970, Springer Berlin Heidelberg.

[43]  A. Force,et al.  Preservation of duplicate genes by complementary, degenerative mutations. , 1999, Genetics.

[44]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[45]  C. Markert,et al.  Evolution of the Gene , 1948, Nature.