Cysteine and tyrosine-rich 1 (CYYR1), a novel unpredicted gene on human chromosome 21 (21q21.2), encodes a cysteine and tyrosine-rich protein and defines a new family of highly conserved vertebrate-specific genes.

A novel human gene has been identified by in-depth bioinformatics analysis of chromosome 21 segment 40/105 (21q21.1), with no coding region predicted in any previous analysis. Brain-derived DNA complementary to RNA (cDNA) sequencing predicts a 154-amino acid product with no similarity to any known protein. The gene has been named cysteine and tyrosine-rich protein 1 gene (symbol cysteine and tyrosine-rich 1, CYYR1). The CYYR1 messenger RNA was found by Northern blot analysis in a broad range of tissues (two transcripts of 3.4 and 2.2 kb). The gene consists of four exons and spans about 107 kb, including a very large intron of 85.8 kb. Analysis of expressed sequence tags shows high CYYR1 expression in cells belonging to the amine precursor uptake and decarboxylation system. We also cloned the cDNA of the murine ortholog Cyyr1, which was mapped by a radiation hybrid panel on chromosome 16 within the region corresponding to that containing the respective human homolog on chromosome 21. Sequence and phylogenetic analysis led to identification of several genes encoding CYYR1 homologous proteins. The most prominent feature identified in the protein family is a central, unique cysteine and tyrosine-rich domain, which is strongly conserved from lower vertebrates (fishes) to humans but is absent in bacteria and invertebrates.

[1]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[2]  Amos Bairoch,et al.  The PROSITE database, its status in 1999 , 1999, Nucleic Acids Res..

[3]  P. Argos,et al.  SRS: information retrieval system for molecular biology data banks. , 1996, Methods in enzymology.

[4]  M. Boguski,et al.  Evolutionary parameters of the transcribed mammalian genome: an analysis of 2,820 orthologous rodent and human sequences. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Scott Cain,et al.  Creation of genome-wide protein expression libraries using random activation of gene expression , 2001, Nature Biotechnology.

[6]  M. Boguski,et al.  Database divisions and homology search files: a guide for the perplexed. , 1997, Genome research.

[7]  H. Gray Gray's Anatomy , 1858 .

[8]  S. Taylor,et al.  A new dynamic tool to perform assembly of expressed sequence tags (ESTs) , 1997, Comput. Appl. Biosci..

[9]  S. Henikoff,et al.  Protein family classification based on searching a database of blocks. , 1994, Genomics.

[10]  M. Hattori,et al.  The DNA sequence of human chromosome 21 , 2000, Nature.

[11]  W R Engels,et al.  Contributing software to the internet: the Amplify program. , 1993, Trends in biochemical sciences.

[12]  S. Eddy Multiple-alignment and -sequence searches , 1998 .

[13]  T. Gibson,et al.  Applying motif and profile searches. , 1996, Methods in enzymology.

[14]  Thomas L. Madden,et al.  BLAST 2 Sequences, a new tool for comparing protein and nucleotide sequences. , 1999, FEMS microbiology letters.

[15]  A. Pearse,et al.  THE CYTOCHEMISTRY AND ULTRASTRUCTURE OF POLYPEPTIDE HORMONE-PRODUCING CELLS OF THE APUD SERIES AND THE EMBRYOLOGIC, PHYSIOLOGIC AND PATHOLOGIC IMPLICATIONS OF THE CONCEPT , 1969, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[16]  Roderic D. M. Page,et al.  TreeView: an application to display phylogenetic trees on personal computers , 1996, Comput. Appl. Biosci..

[17]  Judith A. Blake,et al.  The Mouse Genome Database (MGD): integration nexus for the laboratory mouse , 2001, Nucleic Acids Res..

[18]  S. Antonarakis,et al.  Chromosome 21: from sequence to applications. , 2001, Current opinion in genetics & development.

[19]  H. Griffin,et al.  PCR Technology : Current Innovations , 1994 .

[20]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[21]  R. Bronson,et al.  A mouse model for Down syndrome exhibits learning and behaviour deficits , 1995, Nature Genetics.

[22]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[23]  Maria R. Davis,et al.  A first-generation whole genome-radiation hybrid map spanning the mouse genome. , 1997, Genome research.

[24]  K. Gardiner,et al.  Ts65Dn – localization of the translocation breakpoint and trisomic gene content in a mouse model for Down syndrome , 2001, Cytogenetic and Genome Research.

[25]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[26]  C. Boyd Amine uptake and peptide hormone secretion: APUD cells in a new landscape , 2001, The Journal of physiology.

[27]  L. G. Davis,et al.  Basic methods in molecular biology , 1986 .

[28]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[29]  D. Haussler,et al.  Sequence comparisons using multiple sequences detect three times as many remote homologues as pairwise methods. , 1998, Journal of molecular biology.

[30]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[31]  Peer Bork,et al.  SMART: a web-based tool for the study of genetically mobile domains , 2000, Nucleic Acids Res..

[32]  G. Church,et al.  Genomic sequencing. , 1993, Methods in molecular biology.

[33]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.