SinEx DB 2.0 update 2020: database for eukaryotic single-exon coding sequences

Abstract Single-exon coding sequences (CDSs), also known as ‘single-exon genes’ (SEGs), are defined as nuclear, protein-coding genes that lack introns in their CDSs. They have been studied not only to determine their origin and evolution but also because their expression has been linked to several types of human cancers and neurological/developmental disorders, and many exhibit tissue-specific transcription. We developed SinEx DB that houses DNA and protein sequence information of SEGs from 10 mammalian genomes including human. SinEx DB includes their functional predictions (KOG (euKaryotic Orthologous Groups)) and the relative distribution of these functions within species. Here, we report SinEx 2.0, a major update of SinEx DB that includes information of the occurrence, distribution and functional prediction of SEGs from 60 completely sequenced eukaryotic genomes, representing animals, fungi, protists and plants. The information is stored in a relational database built with MySQL Server 5.7, and the complete dataset of SEG sequences and their GO (Gene Ontology) functional assignations are available for downloading. SinEx DB 2.0 was built with a novel pipeline that helps disambiguate single-exon isoforms from SEGs. SinEx DB 2.0 is the largest available database for SEGs and provides a rich source of information for advancing our understanding of the evolution, function of SEGs and their associations with disorders including cancers and neurological and developmental diseases. Database URL: http://v2.sinex.cl/

[1]  S. Nelson,et al.  De novo truncating variants in the intronless IRF2BPL are responsible for developmental epileptic encephalopathy , 2018, Genetics in Medicine.

[2]  Ying Liang,et al.  Histone-related genes are hypermethylated in lung cancer and hypermethylated HIST1H4F could serve as a pan-cancer biomarker. , 2019, Cancer research.

[3]  Carolina González,et al.  SinEx DB: a database for single exon coding sequences in mammalian genomes , 2016, Database J. Biol. Databases Curation.

[4]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  E. Grzybowska,et al.  Human intronless genes: functional groups, associated diseases, evolution, and mRNA processing in absence of splicing. , 2012, Biochemical and biophysical research communications.

[7]  James C. Hu,et al.  The Gene Ontology Resource: 20 years and still GOing strong , 2019 .

[8]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[9]  Dennis A. Benson,et al.  GenBank , 2017, Nucleic Acids Res..

[10]  A. N. Spiridonov,et al.  Distinct Patterns of Expression and Evolution of Intronless and Intron-Containing Mammalian Genes , 2010, Molecular biology and evolution.

[11]  Bent Petersen,et al.  Improved ontology for eukaryotic single-exon coding sequences in biological databases , 2018, Database J. Biol. Databases Curation.

[12]  Jan Kosinski,et al.  RetrogeneDB–a database of plant and animal retrocopies , 2017, Database J. Biol. Databases Curation.

[13]  Cerebellar degeneration-related autoantigen 1 (CDR1) gene expression in Alzheimer’s disease , 2014, Neurological Sciences.

[14]  Alfonso Valencia,et al.  APPRIS 2017: principal isoforms for multiple gene sets , 2017, Nucleic Acids Res..

[15]  G. Abulizi,et al.  Tumor-suppressor gene SOX1 is a methylation-specific expression gene in cervical adenocarcinoma , 2019, Medicine.

[16]  A. Corvalán,et al.  The Reprimo Gene Family: A Novel Gene Lineage in Gastric Cancer with Tumor Suppressive Properties , 2018, International journal of molecular sciences.

[17]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[18]  Mark Gerstein,et al.  Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation , 2006, Nucleic Acids Res..