Abundant novel transcriptional units and unconventional gene pairs on human chromosome 22.

Novel transcriptional units (TUs) are EST-supported transcribed features not corresponding to known genes. Unconventional gene pairs (UGPs) are pairs of genes and/or TUs sharing exon-to-exon cis-antisense overlaps or putative bidirectional promoters. Computational TU and UGP discovery followed by manual curation was performed in the entire published 34.9-Mb human chromosome 22 euchromatic sequence. Novel TUs (n = 517) were as abundant as known genes (n = 492) and typically did not have nonprimate DNA and protein homologies. One hundred seventy-one (33%) of TUs, but only 13 (3%) of genes, both lacked nonprimate conservation and localized to gaps in the human-mouse BLASTZ alignment. Novel TUs were richer in exonic primate-specific interspersed repetitive elements (P = 0.001) and were more likely to rely on splice junctions provided by them, than were known genes: 19% of spliced TUs, versus 5% of spliced genes, had a splice site within a primate-specific repeat. Hence, novel TUs and known genes may represent different portions of the transcriptome. Two hundred nine (21%) of chromosome 22 transcripts participated in 77 cis-antisense and 42 promoter-sharing UGPs. Transcripts involved simultaneously in both UGP types were more common than was expected (P = 0.01). UGPs were nonrandomly distributed along the sequence: 89 (75%) clustered in distinct regions, the sum of which equaled 4.4 Mb (<13% of the chromosome). Eighty (67%) of the UGPs possessed significant locus structure differences between primates and rodents. Since some TUs may be functional noncoding transcripts and since the cis-regulatory potential of UGPs is well recognized, TUs and UGPs specific to the primate lineage may contribute to the genomic basis for primate-specific phenotypes.

[1]  G. Rubin,et al.  Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Philip Lijnzaad,et al.  The Ensembl genome database project , 2002, Nucleic Acids Res..

[3]  V. Ambros,et al.  An Extensive Class of Small RNAs in Caenorhabditis elegans , 2001, Science.

[4]  M. Fagiolini,et al.  Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. , 2003, Genome research.

[5]  N. Delihas,et al.  MicF: an antisense RNA gene involved in response of Escherichia coli to global stress factors. , 2001, Journal of molecular biology.

[6]  A. Porcellini,et al.  The human homologue of the mouse Surf5 gene encodes multiple alternatively spliced transcripts. , 2002, Gene.

[7]  Katheleen Gardiner,et al.  Mouse models of Down syndrome: how useful can they be? Comparison of the gene content of human chromosome 21 with orthologous mouse genomic regions. , 2003, Gene.

[8]  Erez Y. Levanon,et al.  Widespread occurrence of antisense transcription in the human genome , 2003, Nature Biotechnology.

[9]  A. Edgar The gene structure and expression of human ABHD1: overlapping polyadenylation signal sequence with Sec12 , 2003, BMC Genomics.

[10]  Xiaonian Yang,et al.  Discovery of Molecular and Catalytic Diversity among Human Diphosphoinositol-Polyphosphate Phosphohydrolases , 2000, The Journal of Biological Chemistry.

[11]  J Margolin From comparative and functional genomics to practical decisions in the clinic: a view from the trenches. , 2001, Genome research.

[12]  David A. Hume,et al.  A Guide to the Mammalian Genome , 2003 .

[13]  J. Rinn,et al.  The transcriptional activity of human Chromosome 22. , 2003, Genes & development.

[14]  Allen Chong,et al.  Information for the Coordinates of Exons (ICE): a human splice sites database. , 2004, Genomics.

[15]  J. T. Kadonaga,et al.  The Downstream Promoter Element DPE Appears To Be as Widely Used as the TATA Box in Drosophila Core Promoters , 2000, Molecular and Cellular Biology.

[16]  H. Sjöström,et al.  The TATA-less, GC-rich porcine dipeptidylpeptidase IV (DPPIV) promoter shows bidirectional activity. , 1998, Biological chemistry.

[17]  S. Sealfon,et al.  A novel human GnRH receptor homolog gene: abundant and wide tissue distribution of the antisense transcript. , 1999, The Journal of endocrinology.

[18]  A. Bradley,et al.  Identification of mammalian microRNA host genes and transcription units. , 2004, Genome research.

[19]  Tom H. Pringle,et al.  The human genome browser at UCSC. , 2002, Genome research.

[20]  P. Mitchell,et al.  Analysis of the mouse Dhfr/Rep-3 major promoter region by using linker-scanning and internal deletion mutations and DNase I footprinting , 1990, Molecular and cellular biology.

[21]  J. Nahon,et al.  Birth of Two Chimeric Genes in the Hominidae Lineage , 2001, Science.

[22]  J. Dunlap,et al.  Role for antisense RNA in regulating circadian clock function in Neurospora crassa , 2003, Nature.

[23]  Yoshihide Hayashizaki,et al.  Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. , 2003, Genome research.

[24]  S. Pääbo,et al.  Intra- and Interspecific Variation in Primate Gene Expression Patterns , 2002, Science.

[25]  Michael Ashburner,et al.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review , 2002, Genome Biology.

[26]  M. Long,et al.  Origin of New Genes: Evidence from Experimental and Computational Analyses , 2003, Genetica.

[27]  Jay Shendure,et al.  Computational discovery of sense-antisense transcription in the human and mouse genomes , 2002, Genome Biology.

[28]  Kei-Hoi Cheung,et al.  An integrated approach for finding overlooked genes in yeast , 2002, Nature Biotechnology.

[29]  Hideki Matsui,et al.  Sequencing analysis of a putative human O-sialoglycoprotein endopeptidase gene (OSGEP) and analysis of a bidirectional promoter between the OSGEP and APEX genes. , 2002, Gene.

[30]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[31]  I. Kawashima,et al.  Characterization of the primate-specific repetitive DNA element MER1. , 1992, DNA sequence : the journal of DNA sequencing and mapping.

[32]  Scott Cain,et al.  Creation of genome-wide protein expression libraries using random activation of gene expression , 2001, Nature Biotechnology.

[33]  Robert Fredriksson,et al.  Comparison of the current RefSeq, Ensembl and EST databases for counting genes and gene discovery , 2005, FEBS letters.

[34]  J. N. Topper,et al.  Characterization of human MRP/Th RNA and its nuclear gene: full length MRP/Th RNA is an active endoribonuclease when assembled as an RNP. , 1990, Nucleic acids research.

[35]  M. Tomita,et al.  Identification of putative noncoding RNAs among the RIKEN mouse full-length cDNA collection. , 2003, Genome research.

[36]  Ian Dunham,et al.  Reevaluating human gene annotation: a second-generation analysis of chromosome 22. , 2003, Genome research.

[37]  Lichun Yang,et al.  Trans mobilization of genomic DNA as a mechanism for retrotransposon-mediated exon shuffling. , 2003, Human molecular genetics.

[38]  Doron Lancet,et al.  GeneTide—Terra Incognita Discovery Endeavor: a new transcriptome focused member of the GeneCards/GeneNote suite of databases , 2004, Nucleic Acids Res..

[39]  J. Mattick Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. , 2003, BioEssays : news and reviews in molecular, cellular and developmental biology.

[40]  Wolfgang Nellen,et al.  Differential antisense transcription from the Dictyostelium EB4 gene locus: Implications on antisense-mediated regulation of mRNA stability , 1992, Cell.

[41]  D. Higgs,et al.  Transcription of antisense RNA leading to gene silencing and methylation as a novel cause of human genetic disease , 2003, Nature Genetics.

[42]  et al.,et al.  The RNA component of human telomerase , 1995, Science.

[43]  T. Tsunoda,et al.  Identification and characterization of the potential promoter regions of 1031 kinds of human genes. , 2001, Genome research.

[44]  D. Church,et al.  Spidey: a tool for mRNA-to-genomic alignments. , 2001, Genome research.

[45]  J. Graves,et al.  Did genomic imprinting and X chromosome inactivation arise from stochastic expression? , 2001, Trends in genetics : TIG.

[46]  Michal Galdzicki,et al.  Mammalian overlapping genes: the comparative perspective. , 2004, Genome research.

[47]  M. Lieber,et al.  Bidirectional Gene Organization A Common Architectural Feature of the Human Genome , 2002, Cell.

[48]  D. Barlow,et al.  Quantitative genetics: Turning up the heat on QTL mapping , 2002, Nature Reviews Genetics.

[49]  R. Stoughton,et al.  Experimental annotation of the human genome using microarray technology , 2001, Nature.

[50]  A. Andrés,et al.  Understanding the dynamics of Spinocerebellar Ataxia 8 (SCA8) locus through a comparative genetic approach in humans and apes , 2003, Neuroscience Letters.

[51]  R. Myers,et al.  An abundance of bidirectional promoters in the human genome. , 2003, Genome research.

[52]  S. P. Fodor,et al.  Large-Scale Transcriptional Activity in Chromosomes 21 and 22 , 2002, Science.

[53]  V. Solovyev,et al.  Analysis of canonical and non-canonical splice sites in mammalian genomes. , 2000, Nucleic acids research.