LeishDB: a database of coding gene annotation and non-coding RNAs in Leishmania braziliensis

Abstract Leishmania braziliensis is the etiological agent of cutaneous leishmaniasis, a disease with high public health importance, affecting 12 million people worldwide. Although its genome sequence was originally published in 2007, the two reference public annotations still presents at least 80% of the genes simply classified as hypothetical or putative proteins. Furthermore, it is notable the absence of non-coding RNA (ncRNA) sequences from Leishmania species in public databases. These poorly annotated coding genes and ncRNAs could be important players for the understanding of this protozoan biology, the mechanisms behind host-parasite interactions and disease control. Herein, we performed a new prediction and annotation of L. braziliensis protein-coding genes and non-coding RNAs, using recently developed predictive algorithms and updated databases. In summary, we identified 11 491 ORFs, with 5263 (45.80%) of them associated with proteins available in public databases. Moreover, we identified for the first time the repertoire of 11 243 ncRNAs belonging to different classes distributed along the genome. The accuracy of our predictions was verified by transcriptional evidence using RNA-seq, confirming that they are actually generating real transcripts. These data were organized in a public repository named LeishDB (www.leishdb.com), which represents an improvement on the publicly available data related to genomic annotation for L. braziliensis. This updated information can be useful for future genomics, transcriptomics and metabolomics studies; being an additional tool for genome annotation pipelines and novel studies associated with the understanding of this protozoan genome complexity, organization, biology, and development of innovative methodologies for disease control and diagnostics. Database URL: www.leishdb.com

[1]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[2]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[3]  María Martín,et al.  UniProt: A hub for protein information , 2015 .

[4]  K. C. Oliveira,et al.  Non-coding RNAs in schistosomes: an unexplored world. , 2011, Anais da Academia Brasileira de Ciencias.

[5]  Ankita Srivastava,et al.  Cutaneous Leishmaniasis in a Nonendemic Area of South Rajasthan: A Prospective Study , 2016, Indian journal of dermatology.

[6]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[7]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[8]  S. Karlin,et al.  Prediction of complete gene structures in human genomic DNA. , 1997, Journal of molecular biology.

[9]  Alex Bateman,et al.  InterPro: An Integrated Documentation Resource for Protein Families, Domains and Functional Sites , 2002, Briefings Bioinform..

[10]  Sean R. Eddy,et al.  Infernal 1.1: 100-fold faster RNA homology searches , 2013, Bioinform..

[11]  R B Denman,et al.  Using RNAFOLD to predict the activity of small catalytic RNAs. , 1993, BioTechniques.

[12]  R. Unger,et al.  Genome-wide analysis of small nucleolar RNAs of Leishmania major reveals a rich repertoire of RNAs involved in modification and processing of rRNA , 2015, RNA biology.

[13]  S. V. T. Souza,et al.  Comparative analysis of the geographic distribution of the histopathological spectrum and Leishmania species of American cutaneous leishmaniasis in Brazil. , 2012, Anais brasileiros de dermatologia.

[14]  Denman Rb,et al.  Using RNAFOLD to predict the activity of small catalytic RNAs. , 1993 .

[15]  R. Reguera,et al.  The transcriptome of Leishmania major in the axenic promastigote stage: transcript annotation and relative expression levels by RNA-seq , 2013, BMC Genomics.

[16]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[17]  David R. Kelley,et al.  Gene prediction with Glimmer for metagenomic sequences augmented by classification and clustering , 2011, Nucleic acids research.

[18]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[19]  Lennart Martens,et al.  Using the PRIDE Proteomics Identifications Database for Knowledge Discovery and Data Analysis , 2010, Proteome Bioinformatics.

[20]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[21]  P. Myler,et al.  Small RNAs derived from tRNAs and rRNAs are highly enriched in exosomes from both old and new world Leishmania providing evidence for conserved exosomal RNA Packaging , 2015, BMC Genomics.

[22]  Daniel Lai,et al.  A comprehensive comparison of general RNA–RNA interaction prediction methods , 2015, Nucleic acids research.

[23]  Li Yang,et al.  Genomewide characterization of non-polyadenylated RNAs , 2011, Genome Biology.

[24]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[25]  Sarman Singh,et al.  Challenges and new discoveries in the treatment of leishmaniasis , 2004, Journal of infection and chemotherapy : official journal of the Japan Society of Chemotherapy.

[26]  B Marshall,et al.  Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource , 2004, Nucleic Acids Res..

[27]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[28]  The Uniprot Consortium,et al.  UniProt: a hub for protein information , 2014, Nucleic Acids Res..

[29]  Sarah C. Ayling,et al.  The Ensembl gene annotation system , 2016, Database J. Biol. Databases Curation.

[30]  P. Leprohon,et al.  Drug resistance analysis by next generation sequencing in Leishmania , 2014, International journal for parasitology. Drugs and drug resistance.

[31]  C. Dumas,et al.  A Novel Class of Developmentally Regulated Noncoding RNAs in Leishmania , 2006, Eukaryotic Cell.

[32]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[33]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[34]  K. Morris,et al.  The rise of regulatory RNA , 2014, Nature Reviews Genetics.

[35]  Vivek Rai,et al.  Sequenceserver: a modern graphical user interface for custom BLAST databases , 2015 .

[36]  Inna Myslyuk,et al.  Genome-Wide Analysis of C/D and H/ACA-Like Small Nucleolar RNAs in Leishmania major Indicates Conservation among Trypanosomatids in the Repertoire and in Their rRNA Targets , 2006, Eukaryotic Cell.

[37]  N. Carriero,et al.  The structure and repertoire of small interfering RNAs in Leishmania (Viannia) braziliensis reveal diversification in the trypanosomatid RNAi pathway , 2013, Molecular microbiology.

[38]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[39]  Erik L. L. Sonnhammer,et al.  InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic , 2014, Nucleic Acids Res..

[40]  Thomas D. Otto,et al.  RATT: Rapid Annotation Transfer Tool , 2011, Nucleic acids research.

[41]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[42]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[43]  Brian White,et al.  Comparative genomic analysis of three Leishmania species that cause diverse human disease , 2007, Nature Genetics.

[44]  Nadav S. Bar,et al.  Landscape of transcription in human cells , 2012, Nature.

[45]  P. Myler,et al.  A survey of Leishmania braziliensis genome by shotgun sequencing. , 2004, Molecular and biochemical parasitology.

[46]  Eileen Kraemer,et al.  TriTrypDB: a functional genomic resource for the Trypanosomatidae , 2009, Nucleic Acids Res..

[47]  Sergio Verjovski-Almeida,et al.  Non-coding transcription characterization and annotation , 2012, RNA biology.

[48]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[49]  Brian P. Brunk,et al.  Companion: a web server for annotation and analysis of parasite genomes , 2016, Nucleic Acids Res..

[50]  Sergio Verjovski-Almeida,et al.  Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription , 2007, Genome Biology.

[51]  Evan Bolton,et al.  Database resources of the National Center for Biotechnology Information , 2017, Nucleic Acids Res..

[52]  Juergen Haas,et al.  The Protein Model Portal—a comprehensive resource for protein structure and model information , 2013, Database J. Biol. Databases Curation.

[53]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[54]  J. Mattick The central role of RNA in the genetic programming of complex organisms. , 2010, Anais da Academia Brasileira de Ciencias.

[55]  B. Morgenstern,et al.  AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome , 2006, Genome Biology.

[56]  Rolf Backofen,et al.  CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains , 2014, Nucleic Acids Res..

[57]  Yoshihide Hayashizaki,et al.  Disclosing hidden transcripts: mouse natural sense-antisense transcripts tend to be poly(A) negative and nuclear localized. , 2005, Genome research.

[58]  Chris Mungall,et al.  AmiGO: online access to ontology and annotation data , 2008, Bioinform..

[59]  L. Stein,et al.  JBrowse: a next-generation genome browser. , 2009, Genome research.