Systematic sequencing of cDNA clones using the transposon Tn5.

In parallel with the production of genomic sequence data, attention is being focused on the generation of comprehensive cDNA-sequence resources. Such efforts are increasingly emphasizing the production of high-accuracy sequence corresponding to the entire insert of cDNA clones, especially those presumed to reflect the full-length mRNA. The complete sequencing of cDNA clones on a large scale presents unique challenges because of the generally small, yet heterogeneous, sizes of the cloned inserts. We have developed a strategy for high-throughput sequencing of cDNA clones using the transposon Tn5. This approach has been tailored for implementation within an existing large-scale 'shotgun-style' sequencing program, although it could be readily adapted for use in virtually any sequencing environment. In addition, we have developed a modified version of our strategy that can be applied to cDNA clones with large cloning vectors, thereby overcoming a potential limitation of transposon-based approaches. Here we describe the details of our cDNA-sequencing pipeline, including a summary of the experience in sequencing more than 4200 cDNA clones to produce more than 8 million base pairs of high-accuracy cDNA sequence. These data provide both convincing evidence that the insertion of Tn5 into cDNA clones is sufficiently random for its effective use in large-scale cDNA sequencing as well as interesting insight about the sequence context preferred for insertion by Tn5.

[1]  M. Krzywinski,et al.  An efficient strategy for large-scale high-throughput transposon-mediated sequencing of cDNA clones. , 2002, Nucleic acids research.

[2]  E. Green Strategies for the systematic sequencing of complex genomes , 2001, Nature Reviews Genetics.

[3]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[4]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[5]  H. Mewes,et al.  Toward a catalog of human genes and proteins: sequencing and analysis of 500 novel complete protein coding human cDNAs. , 2001, Genome research.

[6]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[7]  J. Hartley,et al.  DNA cloning using in vitro site-specific recombination. , 2000, Genome research.

[8]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[9]  E. Raleigh,et al.  A simple in vitro Tn7-based transposition system with low target site selectivity for genome and gene analysis. , 2000, Nucleic acids research.

[10]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[11]  M. Vidal,et al.  GATEWAY recombinational cloning: application to the cloning of large numbers of open reading frames or ORFeomes. , 2000, Methods in enzymology.

[12]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[13]  L. Paulin,et al.  An efficient DNA sequencing strategy based on the bacteriophage mu in vitro DNA transposition reaction. , 1999, Genome research.

[14]  E. Mardis,et al.  An encyclopedia of mouse genes , 1999, Nature Genetics.

[15]  W. Reznikoff,et al.  Tn5/IS50 target recognition. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[16]  W. Reznikoff,et al.  Tn5 in Vitro Transposition* , 1998, The Journal of Biological Chemistry.

[17]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[18]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[19]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[20]  L. Hillier,et al.  Expressed sequence tags--ESTablishing bridges between genomes. , 1998, Trends in genetics : TIG.

[21]  J. Berg Genome sequence of the nematode C. elegans: a platform for investigating biology. , 1998, Science.

[22]  Representation of cloned genomic sequences in two sequencing vectors: correlation of DNA sequence and subclone distribution. , 1997, Nucleic acids research.

[23]  S. Devine,et al.  A transposon-based strategy for sequencing repetitive DNA in eukaryotic genomes. , 1997, Genome research.

[24]  R. Gibbs,et al.  Large-scale concatenation cDNA sequencing. , 1997, Genome research.

[25]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.

[26]  N. Shimizu,et al.  [Shotgun sequencing]. , 2019, Tanpakushitsu kakusan koso. Protein, nucleic acid, enzyme.

[27]  C. Hutchison,et al.  Insertion site specificity of the transposon Tn3. , 1995, Nucleic acids research.

[28]  S. Devine,et al.  Efficient integration of artificial transposons into plasmid targets in vitro: a useful tool for DNA mapping, sequencing and genetic analysis. , 1994, Nucleic acids research.

[29]  C. Hutchison,et al.  A directed DNA sequencing strategy based upon Tn3 transposon mutagenesis: application to the ADE1 locus on Saccharomyces cerevisiae chromosome I. , 1991, Nucleic acids research.

[30]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[31]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[32]  D. Berg,et al.  Specificity of transposon Tn5 insertion. , 1983, Genetics.

[33]  F. James Rohlf,et al.  Biometry: The Principles and Practice of Statistics in Biological Research , 1969 .