Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences

The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http://mgc.nci.nih.gov).

[1]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[2]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[3]  M. Soares,et al.  Normalization and subtraction: two approaches to facilitate gene discovery. , 1996, Genome research.

[4]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[5]  Melanie E. Goward,et al.  The DNA sequence of human chromosome 22 , 1999, Nature.

[6]  P E Bourne,et al.  The Protein Data Bank. , 2002, Nucleic acids research.

[7]  R. Gibbs,et al.  Simultaneous shotgun sequencing of multiple cDNA clones. , 1997, DNA sequence : the journal of DNA sequencing and mapping.

[8]  A Bairoch,et al.  SWISS-PROT: connecting biomolecular knowledge via a protein database. , 2001, Current issues in molecular biology.

[9]  R. Gibbs,et al.  Large-scale concatenation cDNA sequencing. , 1997, Genome research.

[10]  E. Green,et al.  Systematic sequencing of cDNA clones using the transposon Tn5. , 2002, Nucleic acids research.

[11]  Jian Zhang,et al.  The Protein Information Resource: an integrated public resource of functional annotation of proteins , 2002, Nucleic Acids Res..

[12]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[13]  Alan K. Mackworth,et al.  Evaluation of gene-finding programs on mammalian sequences. , 2001, Genome research.

[14]  M. Krzywinski,et al.  An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones. , 2002, Nucleic acids research.

[15]  C. Burge,et al.  Computational inference of homologous gene structures in the human genome. , 2001, Genome research.

[16]  J. Craig Venter,et al.  Sequence identification of 2,375 human brain genes , 1992, Nature.

[17]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[18]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.