The status, quality, and expansion of the NIH full-length cDNA project: the Mammalian Gene Collection (MGC).

The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5'-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline.

Ryan D. Morin | Dawood B. Dudekula | Stephen L. Johnson | D. Haussler | M. Brent | D. Lipman | R. Gibbs | R. Myers | J. McPherson | M. Krzywinski | T. Moore | F. Collins | K. Kawakami | O. Griffith | D. Muzny | D. Gerhard | C. Schaefer | K. Buetow | P. Good | M. Guyer | Stephanie A Bosak | J. Schein | M. Feolo | E. Green | J. Malek | M. Marra | M. Dickson | Jiaqian Wu | Piero Carninci | Chia-Lin Wei | Y. Ruan | E. Feingold | G. Bouffard | Alice C Young | L. Grouse | S. Greenhut | M. Ko | S. Sugano | Yutaka Suzuki | L. Wagner | C. M. Shenmen | G. Schuler | S. L. Klein | S. Old | R. Rasooly | Allison M Peck | J. G. Derge | W. Jang | Steven Sherry | L. Misquitta | Eduardo Lee | K. Rotmistrovsky | T. Bonner | J. Kent | Mark Kiekhaus | Terry Furey | C. Prange | Kirsten Schreiber | N. Shapiro | N. Bhat | R. Hopkins | Florence Hsie | Tom Driscoll | M. Soares | T. Casavant | T. Scheetz | Michael J Brown-stein | T. Usdin | Shiraki Toshiyuki | Y. Piao | C. Gruber | M. R. Smith | B. Simmons | R. Waterman | S. Mathavan | P. Gunaratne | A. Garcia | Stephen W Hulyk | Edwin Fuh | Ye Yuan | Anna Sneed | Carla Kowis | A. Hodgson | J. Fahey | Erin Helton | Mark Ketteman | A. Madan | Stephanie D Rodrigues | Amy Sanchez | Michelle Whiting | A. Madari | K. Wetherby | S. Granite | Peggy N Kwong | C. Brinkley | R. Pearson | Robert W Blakesly | Alex C Rodriguez | J. Grimwood | J. Schmutz | Y. Butterfield | M. Griffith | Nancy Y. Liao | R. Morin | Diana L Palmquist | A. Petrescu | U. Skalska | D. Smailus | J. Stott | A. Schnerch | Steven J. M. Jones | R. Holt | Á. Baross | S. Clifton | K. Makowski | Alice C. Young | Diana L. Palmquist | Keith D Wetherby | T. Furey | D. Dudekula | Jim Kent | Steve Sherry | M. J. Brown-stein | M. Smith | S. Sherry | Charles P. Brinkley | D. Muzny | T. Moore | Ágnes Baross

[1]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[2]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[3]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[4]  C. Auffray,et al.  The I.M.A.G.E. Consortium: an integrated molecular analysis of genomes and their expression. , 1996, Genomics.

[5]  N. Nomura,et al.  Construction and characterization of human brain cDNA libraries suitable for analysis of cDNA clones encoding relatively large proteins. , 1997, DNA research : an international journal for rapid publication of reports on genes and genomes.

[6]  M. Boguski,et al.  Synonymous and Nonsynonymous Substitution Distances Are Correlated in Mouse and Rat Genes , 1998, Journal of Molecular Evolution.

[7]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[8]  Williamson The Merck Gene Index project. , 1999, Drug discovery today.

[9]  K. Katz,et al.  Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. , 2000, Trends in genetics : TIG.

[10]  T. Parks,et al.  The AMPA receptors of auditory neurons , 2000, Hearing Research.

[11]  M. Botnick,et al.  Part 3 , 2000, Journal of homosexuality.

[12]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[13]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[14]  R. Strausberg,et al.  A new cancer genome anatomy project web resource for the community. , 2001, Cancer journal.

[15]  H. Mewes,et al.  Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. , 2001, Genome research.

[16]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[17]  R. Strausberg,et al.  The Cancer Genome Anatomy Project: new resources for reading the molecular signatures of cancer , 2001, The Journal of pathology.

[18]  Paul Richardson,et al.  Genetic and genomic tools for Xenopus research: The NIH Xenopus initiative , 2002, Developmental dynamics : an official publication of the American Association of Anatomists.

[19]  W. Keller,et al.  RNA editing by adenosine deaminases generates RNA and protein diversity. , 2002, Biochimie.

[20]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[21]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[22]  G. Rubin,et al.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[23]  R. Strausberg,et al.  The Cancer Genome Anatomy Project: Online Resources to Reveal the Molecular Signatures of Cancer , 2002, Cancer investigation.

[24]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[25]  R. Rasooly,et al.  Genetic and genomic tools for zebrafish research: The NIH zebrafish initiative , 2003, Developmental Dynamics.

[26]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence Project: update and current status , 2003, Nucleic Acids Res..

[27]  S. Anant,et al.  Molecular regulation, evolutionary, and functional adaptations associated with C to U editing of mammalian apolipoproteinB mRNA. , 2003, Progress in nucleic acid research and molecular biology.

[28]  Marcio Luis Acencio,et al.  The generation and utilization of a cancer-oriented representation of the human transcriptome by using expressed sequence tags , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[29]  M. Bamshad,et al.  Signatures of natural selection in the human genome , 2003, Nature Reviews Genetics.

[30]  L. Wagner,et al.  21. UniGene: A Unified View of the Transcriptome , 2003 .

[31]  David L. Steffen,et al.  Large-scale RT-PCR recovery of full-length cDNA clones. , 2004, BioTechniques.

[32]  O. Griffith,et al.  Systematic recovery and analysis of full-ORF human cDNA clones. , 2004, Genome research.

[33]  Lisa M. D'Souza,et al.  Genome sequence of the Brown Norway rat yields insights into mammalian evolution , 2004, Nature.

[34]  Jennifer Daub,et al.  Expressed sequence tags: medium-throughput protocols. , 2004, Methods in molecular biology.

[35]  N. Nomura,et al.  Complete sequencing and characterization of 21,243 full-length human cDNAs , 2004, Nature Genetics.