GeneSCF: a real-time based functional enrichment tool with support for multiple organisms

BackgroundHigh-throughput technologies such as ChIP-sequencing, RNA-sequencing, DNA sequencing and quantitative metabolomics generate a huge volume of data. Researchers often rely on functional enrichment tools to interpret the biological significance of the affected genes from these high-throughput studies. However, currently available functional enrichment tools need to be updated frequently to adapt to new entries from the functional database repositories. Hence there is a need for a simplified tool that can perform functional enrichment analysis by using updated information directly from the source databases such as KEGG, Reactome or Gene Ontology etc.ResultsIn this study, we focused on designing a command-line tool called GeneSCF (Gene Set Clustering based on Functional annotations), that can predict the functionally relevant biological information for a set of genes in a real-time updated manner. It is designed to handle information from more than 4000 organisms from freely available prominent functional databases like KEGG, Reactome and Gene Ontology. We successfully employed our tool on two of published datasets to predict the biologically relevant functional information. The core features of this tool were tested on Linux machines without the need for installation of more dependencies.ConclusionsGeneSCF is more reliable compared to other enrichment tools because of its ability to use reference functional databases in real-time to perform enrichment analysis. It is an easy-to-integrate tool with other pipelines available for downstream analysis of high-throughput data. More importantly, GeneSCF can run multiple gene lists simultaneously on different organisms thereby saving time for the users. Since the tool is designed to be ready-to-use, there is no need for any complex compilation and installation procedures.

[1]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[2]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  F. Martín-Saavedra,et al.  A sustained activation of PI3K/NF-κB pathway is critical for the survival of chronic lymphocytic leukemia B cells , 2004, Leukemia.

[5]  B cells. , 2005, Critical care medicine.

[6]  A. Butte,et al.  Systematic pan-cancer analysis of tumour purity , 2015, Nature Communications.

[7]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[8]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[9]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[10]  Alfonso Valencia,et al.  Transcriptome characterization by RNA sequencing identifies a major molecular and clinical subdivision in chronic lymphocytic leukemia , 2014, Genome research.

[11]  Ching-Seng Ang,et al.  FunRich: An open access standalone functional enrichment and interaction network analysis tool , 2015, Proteomics.

[12]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[13]  Tatiana A. Tatusova,et al.  Entrez Gene: gene-centered information at NCBI , 2004, Nucleic Acids Res..

[14]  Steven J. M. Jones,et al.  MEG3 long noncoding RNA regulates the TGF-β pathway genes through formation of RNA–DNA triplex structures , 2015, Nature Communications.

[15]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[16]  Juancarlos Chan,et al.  Gene Ontology Consortium: going forward , 2014, Nucleic Acids Res..

[17]  J. Byrd,et al.  The B-cell receptor signaling pathway as a therapeutic target in CLL. , 2012, Blood.

[18]  Maite Huarte,et al.  Genome-wide analysis of the human p53 transcriptional network unveils a lncRNA tumour suppressor signature , 2014, Nature Communications.

[19]  V. Rotter,et al.  p53-dependent cell cycle control: response to genotoxic stress. , 1998, Seminars in cancer biology.

[20]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[21]  Francesca D. Ciccarelli,et al.  NCG 4.0: the network of cancer genes in the era of massive mutational screenings of cancer genomes , 2014, Database J. Biol. Databases Curation.

[22]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..