Computer-based methods for the mouse full-length cDNA encyclopedia: real-time sequence clustering for construction of a nonredundant cDNA library.

We developed computer-based methods for constructing a nonredundant mouse full-length cDNA library. Our cDNA library construction process comprises assessment of library quality, sequencing the 3' ends of inserts and clustering, and completing a re-array to generate a nonredundant library from a redundant one. After the cDNA libraries are generated, we sequence the 5' ends of the inserts to check the quality of the library; then we determine the sequencing priority of each library. Selected libraries undergo large-scale sequencing of the 3' ends of the inserts and clustering of the tag sequences. After clustering, the nonredundant library is constructed from the original libraries, which have redundant clones. All libraries, plates, clones, sequences, and clusters are uniquely identified, and all information is saved in the database according to this identifier. At press time, our system has been in place for the past two years; we have clustered 939,725 3' end sequences into 127,385 groups from 227 cDNA libraries/sublibraries (see http://genome.gse.riken.go.jp/).

[1]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[2]  James M. Sikela,et al.  Single pass sequencing and physical and genetic mapping of human brain cDNAs , 1992, Nature Genetics.

[3]  E. Wahle,et al.  The biochemistry of 3'-end cleavage and polyadenylation of messenger RNA precursors. , 1992, Annual review of biochemistry.

[4]  S Audic,et al.  Alternate polyadenylation in human mRNAs: a large-scale analysis by EST clustering. , 1998, Genome research.

[5]  Piero Carninci,et al.  Transcriptional sequencing: A method for DNA sequencing using RNA polymerase. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[6]  G. Schuler Pieces of the puzzle: expressed sequence tags and the catalog of human genes , 1997, Journal of Molecular Medicine.

[7]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[8]  J. Kawai,et al.  Automated filtration-based high-throughput plasmid preparation system. , 1999, Genome research.

[9]  M. Gelfand,et al.  Frequent alternative splicing of human genes. , 1999, Genome research.

[10]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[11]  J. Kawai,et al.  Increased specificity of reverse transcription priming by trehalose and oligo-blockers allows high-efficiency window separation of mRNA display. , 1999, Nucleic acids research.

[12]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[13]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning. , 1999, Methods in enzymology.

[14]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[15]  Tetsuo Nishikawa,et al.  Assessing protein coding region integrity in cDNA sequencing projects , 1998, Bioinform..

[16]  A. Swaroop,et al.  Expressed sequence tags and chromosomal localization of cDNA clones from a subtracted retinal pigment epithelium library. , 1992, Genomics.

[17]  Piero Carninci,et al.  Comparative evaluation of 5'-end-sequence quality of clones in CAP trapper and other full-length-cDNA libraries. , 2001, Gene.

[18]  E. Wahle,et al.  3'-end cleavage and polyadenylation of mRNA precursors. , 1995, Biochimica et biophysica acta.

[19]  Piero Carninci,et al.  Characterization of gene expression in mouse blastocyst using single-pass sequencing of 3995 clones. , 1998, Genomics.

[20]  Piero Carninci,et al.  High-efficiency cloning of Arabidopsis full-length cDNA by biotinylated CAP trapper. , 1998, The Plant journal : for cell and molecular biology.

[21]  G. Edwalds-Gilbert,et al.  Alternative poly(A) site selection in complex transcription units: means to an end? , 1997, Nucleic acids research.

[22]  P. Deloukas,et al.  A Gene Map of the Human Genome , 1996, Science.

[23]  Winston A Hide,et al.  A comprehensive approach to clustering of expressed human gene sequence: the sequence tag alignment and consensus knowledge base. , 1999, Genome research.

[24]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[25]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[26]  J. Craig Venter,et al.  Sequence identification of 2,375 human brain genes , 1992, Nature.

[27]  Piero Carninci,et al.  High-efficiency full-length cDNA cloning by biotinylated CAP trapper. , 1996, Genomics.

[28]  Piero Carninci,et al.  High efficiency selection of full-length cDNA by improved biotinylated cap trapper. , 1997, DNA research : an international journal for rapid publication of reports on genes and genomes.

[29]  N Sasaki,et al.  Thermostabilization and thermoactivation of thermolabile enzymes by trehalose and its application for the synthesis of full length cDNA. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[30]  D. Davison,et al.  d2_cluster: a validated method for clustering EST and full-length cDNAsequences. , 1999, Genome research.

[31]  Piero Carninci,et al.  Normalization and subtraction of cap-trapper-selected cDNAs to prepare full-length cDNA libraries for rapid discovery of new genes. , 2000, Genome research.

[32]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.