Large-scale taxonomic profiling of eukaryotic model organisms: a comparison of orthologous proteins encoded by the human, fly, nematode, and yeast genomes.

Comparisons of DNA and protein sequences between humans and model organisms, including the yeast Saccharomyces cerevisiae, the nematode Caenorhabditis elegans, and the fruit fly Drosophila melanogaster, are a significant source of information about the function of human genes and proteins in both normal and disease states. Important questions regarding cross-species sequence comparison remain unanswered, including (1) the fraction of the metabolic, signaling, and regulatory pathways that is shared by humans and the various model organisms; and (2) the validity of functional inferences based on sequence homology. We addressed these questions by analyzing the available fractions of human, fly, nematode, and yeast genomes for orthologous protein-coding genes, applying strict criteria to distinguish between candidate orthologous and paralogous proteins. Forty-two quartets of proteins could be identified as candidate orthologs. Twenty-four Drosophila protein sequences were more similar to their human orthologs than the corresponding nematode proteins. Analysis of sequence substitutions and evolutionary distances in this data set revealed that most C. elegans genes are evolving more rapidly than Drosophila genes, suggesting that unequal evolutionary rates may contribute to the differences in similarity to human protein sequences. The available fraction of Drosophila proteins appears to lack representatives of many protein families and domains, reflecting the relative paucity of genomic data from this species.

[1]  R F Doolittle,et al.  Determining divergence times with a protein clock: update and reevaluation. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[2]  R. Raff,et al.  Evidence for a clade of nematodes, arthropods and other moulting animals , 1997, Nature.

[3]  J. Huelsenbeck,et al.  Application and accuracy of molecular phylogenies. , 1994, Science.

[4]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[5]  Sudhir Kumar,et al.  MEGA: Molecular Evolutionary Genetics Analysis software for microcomputers , 1994, Comput. Appl. Biosci..

[6]  C. Nielsen Animal Evolution: Interrelationships of the Living Phyla , 1995 .

[7]  R. Doolittle,et al.  Determining Divergence Times of the Major Kingdoms of Living Organisms with a Protein Clock , 1996, Science.

[8]  L. Hood,et al.  Gene families: the taxonomy of protein paralogs and chimeras. , 1997, Science.

[9]  A. Sidow,et al.  A molecular evolutionary framework for eukaryotic model organisms , 1994, Current Biology.

[10]  G. Borsani,et al.  Identification and mapping of human cDNAs homologous to Drosophila mutant genes through EST database searching , 1996, Nature Genetics.

[11]  W. Fitch Distinguishing homologous from analogous proteins. , 1970, Systematic zoology.

[12]  J. Wootton,et al.  Analysis of compositionally biased regions in sequence databases. , 1996, Methods in enzymology.

[13]  S. Altschul,et al.  Issues in searching molecular sequence databases , 1994, Nature Genetics.

[14]  Eugene V. Koonin,et al.  [18] Protein sequence comparison at genome scale , 1996 .

[15]  Mark L. Blaxter,et al.  A molecular evolutionary framework for the phylum Nematoda , 1998, Nature.

[16]  Griffiths,et al.  Biomaterials and Granulomas , 1996, Methods.

[17]  Michael Y. Galperin,et al.  Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea , 1997, Molecular microbiology.

[18]  J. Garey Molecular analysis supports a tardigrade-arthropod association , 1996 .

[19]  P Bork,et al.  Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[20]  A. H. Clark,et al.  Animal evolution , 1981 .

[21]  J. Ahringer Turn to the worm! , 1997, Current opinion in genetics & development.

[22]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[23]  M. Boguski,et al.  Genome cross-referencing and XREFdb: Implications for the identification and analysis of genes mutated in human disease , 1997, Nature Genetics.

[24]  A. Lupas Prediction and analysis of coiled-coil structures. , 1996, Methods in enzymology.

[25]  S F Altschul,et al.  Local alignment statistics. , 1996, Methods in enzymology.

[26]  D. McHugh,et al.  Molecular evidence that echiurans and pogonophorans are derived annelids. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[27]  James Lyons-Weiler,et al.  Escaping from the Felsenstein zone by detecting long branches in phylogenetic data. , 1997, Molecular phylogenetics and evolution.

[28]  R. de Wachter,et al.  18S rRNA data indicate that Aschelminthes are polyphyletic in origin and consist of at least three distinct clades. , 1995, Molecular biology and evolution.

[29]  G D Schuler,et al.  A workbench for multiple alignment construction and analysis , 1991, Proteins.

[30]  Eugene W. Myers,et al.  Optimal alignments in linear space , 1988, Comput. Appl. Biosci..

[31]  C. Marshall,et al.  The Coming of Age of Molecular Systematics , 1998, Science.

[32]  G M Rubin,et al.  Around the genomes: the Drosophila genome project. , 1996, Genome research.

[33]  Eugene V. Koonin,et al.  SEALS: A System for Easy Analysis of Lots of Sequences , 1997, ISMB.

[34]  G. Rubin,et al.  The Role of the Genome Project in Determining Gene Function: Insights from Model Organisms , 1996, Cell.

[35]  P. Bork,et al.  Metabolism and evolution of Haemophilus influenzae deduced from a whole-genome comparison with Escherichia coli , 1996, Current Biology.