Computational prediction of SEG (single exon gene) function in humans.

Human genes are often interrupted by non-coding, intragenic sequences called introns. Hence, the gene sequence is divided into exons (coding segments) and introns (non-coding segments). Consequently, a majority of them are multi exon genes (MEG). However, a considerable amount of single exon genes (SEG) are present in the human genome (approximately 12%). This amount is sizeable and it is important to probe their molecular function and cellular role. Hence, we performed a genome wide functional assignment to 3750 SEG sequences using PFAM (protein family database), PROSITE (database of biologically meaningful signatures or motifs) and SUPERFAMILY (a library covering all proteins of known 3 dimensional structure). PFAM assigned 13% SEG to trans-membrane receptor genes of the G-protein coupled receptor (GPCR) family and showed that a majority of SEG proteins have DNA binding function. PROSITE identified 336 unique motif types in them and this accounts for 25% of all known patterns, with a majority having PHOSPHORYLATION and ACETYLATION signals. SUPERFAMILY assigned 33% SEG to the membrane all alpha (proteins containing alpha helix structural elements according to SCOP (structural classification of proteins) definition). Functional assignment of SEG proteins at multiple levels (sequence signals, sequence families, 3D structures) using PFAM, PROSITE and SUPERFAMILY is envisioned to suggest their selective and predominant molecular function in cellular systems. Their function as DNA binding, phosphorylating, acetylating and house-keeping agents is intriguing. The analysis also showed evidence of SEG expression and retro-transposition. However, this information is inadequate to draw concerted conclusion on the prevalent role played by these proteins in cellular biology. A complete understanding of SEG function will help to explore their role in cellular environment. The derived datasets from these analyses are available at http://sege.ntu.edu.sg/wester/intronless/human/.

[1]  Meena Kishore Sakharkar,et al.  Genome SEGE: A database for 'intronless' genes in eukaryotic genomes , 2004, BMC Bioinformatics.

[2]  A. Butte,et al.  Further defining housekeeping, or "maintenance," genes Focus on "A compendium of gene expression in normal human tissues". , 2001, Physiological genomics.

[3]  Paul Shapshak,et al.  A report on single exon genes (SEG) in eukaryotes. , 2004, Frontiers in bioscience : a journal and virtual library.

[4]  J. Kennedy,et al.  A human serotonin 1D receptor variant (5HT1D beta) encoded by an intronless gene on chromosome 6. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[5]  J. Brosius,et al.  Many G-protein-coupled receptors are encoded by retrogenes. , 1999, Trends in genetics : TIG.

[6]  Cyrus Chothia,et al.  The SUPERFAMILY database in 2004: additions and improvements , 2004, Nucleic Acids Res..

[7]  M. Hentze,et al.  The human intronless melanocortin 4-receptor gene is NMD insensitive. , 2002, Human Molecular Genetics.

[8]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[9]  Meena Kishore Sakharkar,et al.  SEGE: A database on 'intron less/single exonic' genes from eukaryotes , 2002, Bioinform..

[10]  J. Brosius The Contribution of RNAs and Retroposition to Evolutionary Novelties , 2003, Genetica.

[11]  A Nava,et al.  Characterization of C14orf4, a novel intronless human gene containing a polyglutamine repeat, mapped to the ARVD1 critical region. , 2000, Biochemical and biophysical research communications.

[12]  Meena Kishore Sakharkar,et al.  Distributions of exons and introns in the human genome , 2004, Silico Biol..

[13]  Cyrus Chothia,et al.  SUPERFAMILY: HMMs representing all proteins of known structure. SCOP sequence searches, alignments and genome assignments , 2002, Nucleic Acids Res..

[14]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[15]  Tin Wee Tan,et al.  ExInt: an Exon/Intron database , 2000, Nucleic Acids Res..

[16]  G. Fink,et al.  Pseudogenes in yeast? , 1987, Cell.

[17]  Adam Godzik,et al.  Clustering of highly homologous sequences to reduce the size of large protein databases , 2001, Bioinform..

[18]  R D Klausner,et al.  The mammalian gene collection. , 1999, Science.

[19]  R. Rozmahel,et al.  Human dopamine D1 receptor encoded by an intronless gene on chromosome 5 , 1990, Nature.

[20]  E. Levanon,et al.  Human housekeeping genes are compact. , 2003, Trends in genetics : TIG.

[21]  R. Dixon,et al.  Delineation of the intronless nature of the genes for the human and hamster beta 2-adrenergic receptor and their putative promoter regions. , 1987, The Journal of biological chemistry.

[22]  J. Brosius,et al.  Reverse transcriptase: Mediator of genomic plasticity , 2005, Virus Genes.

[23]  S Karlin,et al.  Why are human G-protein-coupled receptors predominantly intronless? , 1999, Trends in genetics : TIG.

[24]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..