RNAcentral 2021: secondary structure integration, improved sequence search and new member databases

Abstract RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world’s largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org.

[1]  Astrid Gall,et al.  Ensembl 2020 , 2019, Nucleic Acids Res..

[2]  Nan Yu,et al.  The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs , 2002, BMC Bioinformatics.

[3]  Panayiotis Tsanakas,et al.  DIANA-LncBase v2: indexing microRNA targets on non-coding transcripts , 2015, Nucleic Acids Res..

[4]  James E. Allen,et al.  Ensembl Genomes 2020—enabling non-vertebrate genomic research , 2019, Nucleic Acids Res..

[5]  Vladimir B. Bajic,et al.  LncBook: a curated knowledgebase of human long non-coding RNAs , 2018, Nucleic Acids Res..

[6]  Paul Denny,et al.  Genenames.org: the HGNC and VGNC resources in 2019 , 2018, Nucleic Acids Res..

[7]  Andrzej Zielezinski,et al.  5SRNAdb: an information resource for 5S ribosomal RNAs , 2015, Nucleic Acids Res..

[8]  S. Eddy,et al.  Homologs of small nucleolar RNAs in Archaea. , 2000, Science.

[9]  Tony Sawford,et al.  Expanding the horizons of microRNA bioinformatics , 2018, RNA.

[10]  Alex Bateman,et al.  RNAcentral: a hub of information for non-coding RNA sequences , 2018, Nucleic Acids Res..

[11]  Patricia P. Chan,et al.  GtRNAdb 2.0: an expanded database of transfer RNA genes identified in complete and draft genomes , 2015, Nucleic Acids Res..

[12]  K. Tsuchida,et al.  Myogenin promoter‐associated lncRNA Myoparr is essential for myogenic differentiation , 2019, EMBO reports.

[13]  David Haussler,et al.  UCSC Genome Browser enters 20th year , 2019, Nucleic Acids Res..

[14]  Tsippi Iny Stein,et al.  The GeneCards Suite: From Gene Data Mining to Disease Genome Sequence Analyses , 2016, Current protocols in bioinformatics.

[15]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[16]  J. T. Madison,et al.  Structure of a Ribonucleic Acid , 1965, Science.

[17]  Ruth L. Seal,et al.  A guide to naming human non‐coding RNA genes , 2020, The EMBO journal.

[18]  Jan Gorodkin,et al.  The identification and functional annotation of RNA structures conserved in vertebrates , 2017, Genome research.

[19]  Ana Kozomara,et al.  miRBase: from microRNA sequences to function , 2018, Nucleic Acids Res..

[20]  Alex Bateman,et al.  Exploring Non‐Coding RNAs in RNAcentral , 2020, Current protocols in bioinformatics.

[21]  Rachael P. Huntley,et al.  Gene Ontology Curation of Neuroinflammation Biology Improves the Interpretation of Alzheimer’s Disease Gene Expression Data , 2020, Journal of Alzheimer's disease : JAD.

[22]  Monte Westerfield,et al.  The Zebrafish Information Network: new support for non-coding genes, richer Gene Ontology annotations and the Alliance of Genome Resources , 2018, Nucleic Acids Res..

[23]  C. Ponting,et al.  Correction: The long non-coding RNA Cerox1 is a post transcriptional regulator of mitochondrial complex I catalytic activity , 2019, eLife.

[24]  Robert D. Finn,et al.  R2DT: computational framework for template-based RNA secondary structure visualisation across non-coding RNA types , 2020, bioRxiv.

[25]  Doron Lancet,et al.  MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search , 2016, Nucleic Acids Res..

[26]  Sean R. Eddy,et al.  nhmmer: DNA homology search with profile HMMs , 2013, Bioinform..

[27]  Sherif Abou Elela,et al.  snoDB: an interactive database of human snoRNA sequences, abundance and interactions , 2019, Nucleic Acids Res..

[28]  C. Ponting,et al.  The long non-coding RNA Cerox1 is a post transcriptional regulator of mitochondrial complex I catalytic activity , 2019, eLife.

[29]  Pelin Yilmaz,et al.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools , 2012, Nucleic Acids Res..

[30]  Z. Weinberg,et al.  Discovery of 20 novel ribosomal leader candidates in bacteria and archaea , 2020, BMC Microbiology.

[31]  Giulia Antonazzo,et al.  FlyBase 2.0: the next generation , 2018, Nucleic Acids Res..

[32]  Rafael C. Jimenez,et al.  The MIntAct project—IntAct as a common curation platform for 11 molecular interaction databases , 2013, Nucleic Acids Res..

[33]  Artemis G. Hatzigeorgiou,et al.  DIANA-TarBase v8: a decade-long collection of experimentally supported miRNA–gene interactions , 2017, Nucleic Acids Res..

[34]  Lennart Martens,et al.  LNCipedia 5: towards a reference set of human long non-coding RNAs , 2018, Nucleic Acids Res..

[35]  Lan Wang,et al.  RiboVision suite for visualization and analysis of ribosomes. , 2014, Faraday discussions.

[36]  E. Hovig,et al.  MirGeneDB 2.0: the metazoan microRNA complement , 2019, bioRxiv.

[37]  Robert D. Finn,et al.  Rfam: Wikipedia, clans and the “decimal” release , 2010, Nucleic Acids Res..

[38]  T. Lowe,et al.  Methylation guide RNA evolution in archaea: structure, function and genomic organization of 110 C/D box sRNA families across six Pyrobaculum species , 2018, Nucleic acids research.

[39]  Paul Flicek,et al.  ncRNA orthologies in the vertebrate lineage , 2016, Database J. Biol. Databases Curation.

[40]  C. Bult,et al.  The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases , 2019, Genetics.

[41]  I. Ulitsky,et al.  Regulation of CHD2 expression by the Chaserr long noncoding RNA gene is essential for viability , 2019, Nature Communications.

[42]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[43]  Rachael P. Huntley,et al.  QuickGO: a web-based tool for Gene Ontology searching , 2009, Bioinform..

[44]  David Hoksza,et al.  TRAVeLer: a tool for template-based RNA secondary structure visualization , 2017, BMC Bioinformatics.

[45]  Patricia P. Chan,et al.  tRNAscan-SE 2.0: improved detection and functional classification of transfer RNA genes , 2019, bioRxiv.

[46]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..