Prediction of the coding sequences of mouse homologues of KIAA gene: III. the complete nucleotide sequences of 500 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries.

We have conducted a human cDNA project to predict protein-coding sequences (CDSs) in large cDNAs (> 4 kb) since 1994, and the number of newly identified genes, known as KIAA genes, already exceeds 2000. The ultimate goal of this project is to clarify the physiological functions of the proteins encoded by KIAA genes. To this end, the project has recently been expanded to include isolation and characterization of mouse KIAA-counterpart genes. We herein present the entire sequences and the chromosome loci of 500 mKIAA cDNA clones and 13 novel cDNA clones that were incidentally identified during this project. The average size of the 513 cDNA sequences reached 4.3 kb and that of the deduced amino acid sequences from these cDNAs was 816 amino acid residues. By comparison of the predicted CDSs between mouse and human KIAAs, 12 mKIAA cDNA clones were assumed to be differently spliced isoforms of the human cDNA clones. The comparison of mouse and human sequences also revealed that four pairs of human KIAA cDNAs are derived from single genes. Notably, a homology search against the public database indicated that 4 out of 13 novel cDNA clones were homologous to the disease-related genes.

[1]  Christopher R. Jones,et al.  An hPer2 Phosphorylation Site Mutation in Familial Advanced Sleep Phase Syndrome , 2001, Science.

[2]  P. Huie,et al.  Male infertility, impaired spermatogenesis, and azoospermia in mice deficient for the pseudophosphatase Sbf1. , 2002, The Journal of clinical investigation.

[3]  J. McPherson,et al.  Molecular cloning and mapping of human semaphorin F from the Cri-du-chat candidate interval. , 1998, Biochemical and biophysical research communications.

[4]  N. Nomura,et al.  Characterization of cDNA clones selected by the GeneMark analysis from size-fractionated cDNA libraries from human brain. , 1999, DNA research : an international journal for rapid publication of reports on genes and genomes.

[5]  Hisashi Koga,et al.  HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE , 2004, Nucleic Acids Res..

[6]  H. Koeppen,et al.  MAGI-1: a widely expressed, alternatively spliced tight junction protein. , 2002, Experimental cell research.

[7]  R. Steinman,et al.  Generation of large numbers of dendritic cells from mouse bone marrow cultures supplemented with granulocyte/macrophage colony-stimulating factor , 1992, The Journal of experimental medicine.

[8]  Y. Yarden,et al.  Neu and its ligands: From an oncogene to neural factors , 1993, BioEssays : news and reviews in molecular, cellular and developmental biology.

[9]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2002, Nucleic Acids Res..

[10]  E. Birney,et al.  Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs , 2002, Nature.

[11]  D. Flynn,et al.  The actin filament-associated protein AFAP-110 is an adaptor protein that modulates changes in actin filament integrity , 2001, Oncogene.

[12]  M. Seeliger,et al.  Mutation of CDH23, encoding a new member of the cadherin gene family, causes Usher syndrome type 1D , 2001, Nature genetics.

[13]  Sudhir Kumar,et al.  Comparative Genomics in Eukaryotes , 2005 .

[14]  George M. Hilliard,et al.  Increased Myocardial Rab GTPase Expression: A Consequence and Cause of Cardiomyopathy , 2001, Circulation research.

[15]  Temple F. Smith,et al.  Comparison of the complete protein sets of worm and yeast: orthology and divergence. , 1998, Science.

[16]  T. Nagase,et al.  Characterization of size-fractionated cDNA libraries generated by the in vitro recombination-assisted method. , 2002, DNA research : an international journal for rapid publication of reports on genes and genomes.

[17]  Haruhiko Koseki,et al.  Prediction of the coding sequences of mouse homologues of FLJ genes: the complete nucleotide sequences of 110 mouse FLJ-homologous cDnas identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries. , 2004, DNA research : an international journal for rapid publication of reports on genes and genomes.

[18]  G. Rubin,et al.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Frank W. Nicholas,et al.  Online Mendelian Inheritance in Animals (OMIA): a comparative knowledgebase of genetic disorders and other familial traits in non-laboratory animals , 2003, Nucleic Acids Res..

[20]  Method for systematic targeted isolation of homologous cDNA fragments in a multiplex format. , 2004, BioTechniques.

[21]  Reiko Kikuno,et al.  Prediction of the coding sequences of mouse homologues of KIAA gene: II. The complete nucleotide sequences of 400 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries. , 2003, DNA research : an international journal for rapid publication of reports on genes and genomes.

[22]  P. De Camilli,et al.  Synaptojanin 1: localization on coated endocytic intermediates in nerve terminals and interaction of its 170 kDa isoform with Eps15 , 1997, FEBS letters.

[23]  E. Koonin,et al.  The Impact of Comparative Genomics on Our Understanding of Evolution , 2000, Cell.

[24]  J. Sikela,et al.  High-throughput sequence identification of gene coding variants within alcohol-related QTLs , 2001, Mammalian Genome.

[25]  G. Rubin,et al.  A computer program for aligning a cDNA sequence with a genomic DNA sequence. , 1998, Genome research.

[26]  N. Nomura,et al.  Construction and characterization of human brain cDNA libraries suitable for analysis of cDNA clones encoding relatively large proteins. , 1997, DNA research : an international journal for rapid publication of reports on genes and genomes.

[27]  Reiko Kikuno,et al.  Prediction of the coding sequences of mouse homologues of KIAA gene: I. The complete nucleotide sequences of 100 mouse KIAA-homologous cDNAs identified by screening of terminal sequences of cDNA clones randomly sampled from size-fractionated libraries. , 2002, DNA research : an international journal for rapid publication of reports on genes and genomes.

[28]  Stanley Letovsky,et al.  GDB: the Human Genome Database , 1998, Nucleic Acids Res..

[29]  L. Johannes,et al.  Characterization of Novel Rab6‐Interacting Proteins Involved in Endosome‐to‐TGN Transport , 2002, Traffic.

[30]  N. Nomura,et al.  Characterization of long cDNA clones from human adult spleen. II. The complete sequences of 81 cDNA clones. , 2003, DNA research : an international journal for rapid publication of reports on genes and genomes.

[31]  E. Bellefroid,et al.  Zinc finger proteins in early Xenopus development. , 1996, The International journal of developmental biology.

[32]  R. Kurzrock,et al.  A novel c-abl protein product in Philadelphia-positive acute lymphoblastic leukaemia , 1987, Nature.

[33]  K. Davies,et al.  The occurrence of families of repetitive sequences in a library of cloned cDNA from human lymphocytes. , 1981, Nucleic acids research.

[34]  K. Becker,et al.  Rapid isolation and characterization of 118 novel C2H2-type zinc finger cDNAs expressed in human brain. , 1995, Human Molecular Genetics.

[35]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[36]  G. Privé,et al.  Crystal structure of the BTB domain from PLZF. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[37]  P. Stenson,et al.  Human Gene Mutation Database (HGMD®): 2003 update , 2003, Human mutation.

[38]  O. Ohara,et al.  A comprehensive approach for establishment of the platform to analyze functions of KIAA proteins: Generation and evaluation of anti‐mKIAA antibodies , 2004, Proteomics.

[39]  R. Adelstein,et al.  Cloning of the cDNA encoding human nonmuscle myosin heavy chain-B and analysis of human tissues with isoform-specific antibodies , 1995, Journal of Muscle Research & Cell Motility.

[40]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[41]  J. Jurka Repbase update: a database and an electronic journal of repetitive elements. , 2000, Trends in genetics : TIG.

[42]  C. Bult,et al.  Functional annotation of a full-length mouse cDNA collection , 2001, Nature.

[43]  T. Nagase,et al.  Prediction of the coding sequences of unidentified human genes. XXII. The complete sequences of 50 new cDNA clones which code for large proteins. , 2001, DNA research : an international journal for rapid publication of reports on genes and genomes.

[44]  E. Lander,et al.  ARSACS, a spastic ataxia common in northeastern Québec, is caused by mutations in a new gene encoding an 11.5-kb ORF , 2000, Nature Genetics.

[45]  S. Maekawa,et al.  Characterization of a Novel Rat Brain Glycosylphosphatidylinositol-anchored Protein (Kilon), a Member of the IgLON Cell Adhesion Molecule Family* , 1999, The Journal of Biological Chemistry.

[46]  J. Vandekerckhove,et al.  unc-53 controls longitudinal migration in C. elegans. , 2002, Development.

[47]  Colin N. Dewey,et al.  Initial sequencing and comparative analysis of the mouse genome. , 2002 .

[48]  J. Blake,et al.  Human disease genes and their cloned mouse orthologs: exploration of the FANTOM2 cDNA sequence data set. , 2003, Genome research.

[49]  Dawood B. Dudekula,et al.  Verification and initial annotation of the NIA mouse 15K cDNA clone set , 2001, Nature Genetics.

[50]  G. Blobel,et al.  The human CAN protein, a putative oncogene product associated with myeloid leukemogenesis, is a nuclear pore complex protein that faces the cytoplasm. , 1994, Proceedings of the National Academy of Sciences of the United States of America.

[51]  W. Odenwald,et al.  castor encodes a novel zinc finger protein required for the development of a subset of CNS neurons in drosophila , 1992, Neuron.

[52]  Reiko Kikuno,et al.  Construction of expression-ready cDNA clones for KIAA genes: manual curation of 330 KIAA cDNA clones. , 2002, DNA research : an international journal for rapid publication of reports on genes and genomes.

[53]  D. Gautheret,et al.  Patterns of variant polyadenylation signal usage in human genes. , 2000, Genome research.

[54]  J. Rowley,et al.  Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[55]  High-throughput production of recombinant antigens for mouse KIAA proteins in Escherichia coli: computational allocation of possible antigenic regions, and construction of expression plasmids of glutathione-S-transferase-fused antigens by an in vitro recombination-assisted method. , 2003, DNA research : an international journal for rapid publication of reports on genes and genomes.

[56]  Osamu Ohara,et al.  HUGE: a database for human large proteins identified by Kazusa cDNA sequencing project , 1999, Nucleic Acids Res..

[57]  O. Ohara,et al.  Novel Alternative Splicings of BPAG1 (Bullous Pemphigoid Antigen 1) Including the Domain Structure Closely Related to MACF (Microtubule Actin Cross-linking Factor)* , 2002, The Journal of Biological Chemistry.

[58]  S. Teichmann,et al.  Multi-domain protein families and domain pairs: comparison with known structures and a random model of domain recombination , 2004, Journal of Structural and Functional Genomics.

[59]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[60]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[61]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[62]  C. Chothia,et al.  Assessing sequence comparison methods with reliable structurally identified distant evolutionary relationships. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[63]  L. Ostrowski,et al.  Identification of seven rat axonemal dynein heavy chain genes: expression during ciliated cell differentiation. , 1996, Molecular biology of the cell.

[64]  J M Parry,et al.  The mammalian gene mutation database. , 2000, Mutagenesis.

[65]  K. Okumura,et al.  Characterization of long cDNA clones from human adult spleen. , 2000, DNA Research.

[66]  Janet Kelso,et al.  Assembly, verification, and initial annotation of the NIA mouse 7.4K cDNA clone set. , 2002, Genome research.

[67]  M. Haber,et al.  Cloning and characterization of the human neural cell adhesion molecule, CNTN4 (alias BIG-2) , 2003, Cytogenetic and Genome Research.

[68]  M Ronaghi,et al.  Method enabling pyrosequencing on double-stranded DNA. , 2000, Analytical biochemistry.

[69]  Jun Qin,et al.  N-CoR mediates DNA methylation-dependent repression through a methyl CpG binding protein Kaiso. , 2003, Molecular cell.

[70]  M. Borodovsky,et al.  Detection of new genes in a bacterial genome using Markov models for three gene classes. , 1995, Nucleic acids research.

[71]  M. Boguski,et al.  Comparative analysis of 1196 orthologous mouse and human full-length mRNA and protein sequences. , 1996, Genome research.

[72]  P Bork,et al.  Positionally cloned human disease genes: patterns of evolutionary conservation and functional motifs. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[73]  M. Boguski,et al.  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[74]  M. Prevost,et al.  A missense mutation in the αB-crystallin chaperone gene causes a desmin-related myopathy , 1998, Nature Genetics.

[75]  N. Nomura,et al.  Prediction of the coding sequences of unidentified human genes. I. The coding sequences of 40 new genes (KIAA0001-KIAA0040) deduced by analysis of randomly sampled cDNA clones from human immature myeloid cell line KG-1. , 1994, DNA research : an international journal for rapid publication of reports on genes and genomes.

[76]  E. Fisher,et al.  The frequency and position of Alu repeats in cDNAs, as determined by database searching. , 1995, Genomics.

[77]  T. Nagase,et al.  Detection of spurious interruptions of protein-coding regions in cloned cDNA sequences by GeneMark analysis. , 2000, Genome research.