Construction and characterization of human brain cDNA libraries suitable for analysis of cDNA clones encoding relatively large proteins.

Analysis of proteins registered in the PIR protein database implied that most of relatively large proteins are related to important functions in higher multicellular organisms, but not many large proteins have been registered to date. To establish a protocol for efficient analysis of cDNA clones coding for large proteins, we constructed a series of strictly size-fractionated cDNA libraries of human brain, where the average insert sizes of cDNA clones ranged from 3.3 kb to 10 kb. As judged by hybridization analysis with probes derived from mRNAs of known sizes, the libraries with insert sizes up to 7 kb, at least, contained the clones corresponding to full-length transcripts in addition to truncated products of longer transcripts, but few chimeric clones. Using one of the fractionated libraries with an average insert size of 7 kb, the single-pass sequences from both the ends of randomly sampled clones were determined and sarched against DNA databases. Approximately 90% of the clones were found to be new with respect to their 5'-sequences while their 3'-sequences were frequently similar to the registered expression sequence tags. Examination of the protein-coding capacity in an in vitro transcription/translation system showed that about 20% of the clones direct the synthesis of proteins with apparent molecular masses larger than 50 kDa. The set of libraries constructed here should be very useful for the accumulation of sequence data on large proteins in the human brain.

[1]  J. Devereux,et al.  A comprehensive set of sequence analysis programs for the VAX , 1984, Nucleic Acids Res..

[2]  N. Nomura,et al.  Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001-KIAA0040) deduced by analysis of randomly sampled cDNA clones from human immature myeloid cell line KG-1. , 1994, DNA research : an international journal for rapid publication of reports on genes and genomes.

[3]  R. Fleischmann,et al.  The Minimal Gene Complement of Mycoplasma genitalium , 1995, Science.

[4]  N. Nomura,et al.  Identification of a novel human gene containing the tetratricopeptide repeat domain from the Down syndrome region of chromosome 21. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[5]  N. Nomura,et al.  Prediction of the coding sequences of unidentified human genes. IV. The coding sequences of 40 new genes (KIAA0121-KIAA0160) deduced by analysis of cDNA clones from human cell line KG-1. , 1995, DNA research : an international journal for rapid publication of reports on genes and genomes.

[6]  N. Nomura,et al.  Prediction of the coding sequences of unidentified human genes. VI. The coding sequences of 80 new genes (KIAA0201-KIAA0280) deduced by analysis of cDNA clones from cell line KG-1 and brain. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[7]  N. Nomura,et al.  Prediction of the coding sequences of unidentified human genes. V. The coding sequences of 40 new genes (KIAA0161-KIAA0200) deduced by analysis of cDNA clones from human cell line KG-1. , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[8]  Y. Nakamura,et al.  Sequence analysis of the genome of the unicellular cyanobacterium Synechocystis sp. strain PCC6803. II. Sequence determination of the entire genome and assignment of potential protein-coding regions (supplement). , 1996, DNA research : an international journal for rapid publication of reports on genes and genomes.

[9]  O. Ohara,et al.  A novel RING-H2 motif protein downregulated by axotomy: its characteristic localization at the postsynaptic density of axosomatic synapse , 1995, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[10]  U. K. Laemmli,et al.  Cleavage of Structural Proteins during the Assembly of the Head of Bacteriophage T4 , 1970, Nature.

[11]  G. Church,et al.  Genomic sequencing. , 1993, Methods in molecular biology.

[12]  S. Bentolila,et al.  The Genexpress Index: a resource for gene discovery and the genic map of the human genome. , 1995, Genome research.

[13]  R. Moon,et al.  Generation of diversity in nonerythroid spectrins. Multiple polypeptides are predicted by sequence analysis of cDNAs encompassing the coding region of human nonerythroid alpha-spectrin. , 1990, The Journal of biological chemistry.

[14]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[15]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[16]  R. Fleischmann,et al.  Complete Genome Sequence of the Methanogenic Archaeon, Methanococcus jannaschii , 1996, Science.

[17]  E. Mardis,et al.  Generation and analysis of 280,000 human expressed sequence tags. , 1996, Genome research.