Identification of novel human genes evolutionarily conserved in Caenorhabditis elegans by comparative proteomics.

Modern biomedical research greatly benefits from large-scale genome-sequencing projects ranging from studies of viruses, bacteria, and yeast to multicellular organisms, like Caenorhabditis elegans. Comparative genomic studies offer a vast array of prospects for identification and functional annotation of human ortholog genes. We presented a novel comparative proteomic approach for assembling human gene contigs and assisting gene discovery. The C. elegans proteome was used as an alignment template to assist in novel human gene identification from human EST nucleotide databases. Among the available 18,452 C. elegans protein sequences, our results indicate that at least 83% (15,344 sequences) of C. elegans proteome has human homologous genes, with 7,954 records of C. elegans proteins matching known human gene transcripts. Only 11% or less of C. elegans proteome contains nematode-specific genes. We found that the remaining 7,390 sequences might lead to discoveries of novel human genes, and over 150 putative full-length human gene transcripts were assembled upon further database analyses. [The sequence data described in this paper have been submitted to the

[1]  G C Overton,et al.  Analysis of EST-driven gene annotation in human genomic sequence. , 1998, Genome research.

[2]  Cathy H. Wu,et al.  GeneFIND web server for protein family identification and information retrieval , 1998, Bioinform..

[3]  C J Rawlings,et al.  Computational gene discovery and human disease. , 1997, Current opinion in genetics & development.

[4]  A. Sluder,et al.  The nuclear receptor superfamily has undergone extensive proliferation and diversification in nematodes. , 1999, Genome research.

[5]  D. Galas,et al.  A new five-year plan for the U.S. Human Genome Project. , 1993, Science.

[6]  R. Plasterk,et al.  The complete family of genes encoding G proteins of Caenorhabditis elegans , 1999, Nature Genetics.

[7]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[8]  G. Borsani,et al.  Drosophila-related expressed sequences. , 1997, Human molecular genetics.

[9]  M V Olson,et al.  The human genome project. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[10]  J. Claverie Computational methods for the identification of genes in vertebrate genomic sequences. , 1997, Human molecular genetics.

[11]  H. Kung,et al.  Protein-tyrosine kinase and protein-serine/threonine kinase expression in human gastric cancer cell lines. , 1998, Journal of biomedical science.

[12]  D B Davison,et al.  Alternative gene form discovery and candidate gene selection from gene indexing projects. , 1998, Genome research.

[13]  J A Eisen,et al.  Phylogenomics: improving functional predictions for uncharacterized genes by evolutionary analysis. , 1998, Genome research.

[14]  I. Humphery-Smith,et al.  Small genes/gene-products in Escherichia coli K-12. , 1998, FEMS microbiology letters.

[15]  O. White,et al.  TDB: new databases for biological discovery. , 1996, Methods in enzymology.

[16]  C. Auffray,et al.  The Genexpress IMAGE knowledge base of the human brain transcriptome: a prototype integrated resource for functional and computational genomics. , 1999, Genome research.

[17]  D. Lipman,et al.  A genomic perspective on protein families. , 1997, Science.

[18]  K. O. Elliston,et al.  Toward the development of a gene index to the human genome: an assessment of the nature of high-throughput EST sequence data. , 1996, Genome research.

[19]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[20]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[21]  Michael N. Edmonson,et al.  Reliable identification of large numbers of candidate SNPs from public EST data , 1999, Nature Genetics.

[22]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[23]  B. Dujon The yeast genome project: what did we learn? , 1996, Trends in genetics : TIG.

[24]  L. Greller,et al.  Detecting selective expression of genes and proteins. , 1999, Genome research.

[25]  E. Mardis,et al.  An encyclopedia of mouse genes , 1999, Nature Genetics.

[26]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[27]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[28]  Martin Vingron,et al.  Towards detection of orthologues in sequence databases , 1998, Bioinform..

[29]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[30]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[31]  M. Sawicki,et al.  Human Genome Project. , 1993, American journal of surgery.

[32]  E. Lai,et al.  IMAGE cDNA clones, UniGene clustering, and ACeDB: an integrated resource for expressed sequence information. , 1997, Genome research.

[33]  K. Murakami,et al.  Gene recognition by combination of several gene-finding programs , 1998, Bioinform..

[34]  B. Dujon,et al.  The complete DNA sequence of yeast chromosome III , 1992, Nature.

[35]  J. D. Watson The human genome project: past, present, and future. , 1990, Science.

[36]  F. Collins,et al.  New goals for the U.S. Human Genome Project: 1998-2003. , 1998, Science.

[37]  H. Heng,et al.  Discovery of three novel orphan G-protein-coupled receptors. , 1999, Genomics.

[38]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[39]  G. Borsani,et al.  Identification and mapping of human cDNAs homologous to Drosophila mutant genes through EST database searching , 1996, Nature Genetics.

[40]  W. Tao,et al.  Human homologue of the Drosophila melanogaster lats tumour suppressor modulates CDC2 activity , 1999, Nature Genetics.

[41]  C. Tang,et al.  Identification and gene structure of a novel human PLZF-related transcription factor gene, TZFP. , 1999, Biochemical and biophysical research communications.

[42]  T. Mcclanahan,et al.  Identification through bioinformatics of two new macrophage proinflammatory human chemokines: MIP-3alpha and MIP-3beta. , 1997, Journal of immunology.

[43]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[44]  C. Sander,et al.  Characterization of new proteins found by analysis of short open reading frames from the full yeast genome , 1997, Yeast.

[45]  T. Ideker,et al.  Mining SNPs from EST databases. , 1999, Genome research.