Knowledge-guided analysis of "omics" data using the KnowEnG cloud platform

We present KnowEnG, a free-to-use computational system for analysis of genomics data sets, designed to accelerate biomedical discovery. It includes tools for popular bioinformatics tasks such as gene prioritization, sample clustering, gene set analysis and expression signature analysis. The system offers ‘knowledge-guided’ data-mining and machine learning algorithms, where user-provided data are analyzed in light of prior information about genes, aggregated from numerous knowledge-bases and encoded in a massive ‘Knowledge Network’. KnowEnG adheres to ‘FAIR’ principles: its tools are easily portable to diverse computing environments, run on the cloud for scalable and cost-effective execution of compute-intensive and data-intensive algorithms, and are interoperable with other computing platforms. They are made available through multiple access modes including a web-portal, and include specialized visualization modules. We present use cases and re-analysis of published cancer data sets using KnowEnG tools and demonstrate its potential value in democratization of advanced tools for the modern genomics era.

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[3]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[4]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[5]  J. Mesirov,et al.  GenePattern 2.0 , 2006, Nature Genetics.

[6]  G. Sherlock,et al.  The prognostic role of a gene signature from tumorigenic breast-cancer cells. , 2007, The New England journal of medicine.

[7]  G. Sauter,et al.  Estrogen receptor alpha (ESR1) gene amplification is frequent in breast cancer , 2007, Nature Genetics.

[8]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[9]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[10]  J. Szumiło,et al.  Expression of syndecan-1 and cathepsins D and K in advanced esophageal squamous cell carcinoma. , 2010, Folia histochemica et cytobiologica.

[11]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[12]  Christopher R. Cabanski,et al.  Lung Squamous Cell Carcinoma mRNA Expression Subtypes Are Reproducible, Clinically Important, and Correspond to Normal Cell Types , 2010, Clinical Cancer Research.

[13]  Aris Floratos,et al.  geWorkbench: an open source platform for integrative genomics , 2010, Bioinform..

[14]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[15]  M. Schmelzle,et al.  Esophageal cancer proliferation is mediated by cytochrome P450 2C9 (CYP2C9). , 2011, Prostaglandins & other lipid mediators.

[16]  E. Marcotte,et al.  Prioritizing candidate disease genes by network-based boosting of genome-wide association data. , 2011, Genome research.

[17]  Mary Goldman,et al.  The UCSC cancer genomics browser: update 2011 , 2010, Nucleic Acids Res..

[18]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[19]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[20]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[21]  G. Getz,et al.  Inferring tumour purity and stromal and immune cell admixture from expression data , 2013, Nature Communications.

[22]  Andrew M. Gross,et al.  Network-based stratification of tumor mutations , 2013, Nature Methods.

[23]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[24]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[25]  Elena Marchiori,et al.  Graph clustering with local search optimization: the resolution bias of the objective function matters most. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  S. Robert,et al.  Glutamate transporters in the biology of malignant gliomas , 2013, Cellular and Molecular Life Sciences.

[27]  Benjamin J. Raphael,et al.  Multiplatform Analysis of 12 Cancer Types Reveals Molecular Classification within and across Tissues of Origin , 2014, Cell.

[28]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[29]  Mary Goldman,et al.  The UCSC Cancer Genomics Browser: update 2015 , 2014, Nucleic Acids Res..

[30]  Davide Heller,et al.  STRING v10: protein–protein interaction networks, integrated over the tree of life , 2014, Nucleic Acids Res..

[31]  Michael P. Schroeder,et al.  In silico prescription of anticancer drugs to cohorts of 28 tumor types reveals targeting opportunities. , 2015, Cancer cell.

[32]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[33]  M. Cugmas,et al.  On comparing partitions , 2015 .

[34]  Joshua A. Bittker,et al.  Correlating chemical sensitivity and basal gene expression reveals mechanism of action , 2015, Nature chemical biology.

[35]  Saurabh Sinha,et al.  Characterizing gene sets using discriminative random walks with restart on heterogeneous biological networks , 2016, Bioinform..

[36]  N. Hu,et al.  Genomic Landscape of Somatic Alterations in Esophageal Squamous Cell Carcinoma and Gastric Cancer. , 2016, Cancer research.

[37]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[38]  Mengjie Yan,et al.  The role of platelets in the tumor microenvironment: From solid tumors to leukemia. , 2016, Biochimica et biophysica acta.

[39]  Jun S. Liu,et al.  Comprehensive analyses of tumor immunity: implications for cancer immunotherapy , 2016, Genome Biology.

[40]  Krishna R. Kalari,et al.  Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance , 2016, Genome Biology.

[41]  James R. Hennessy,et al.  shinyheatmap: Ultra fast low memory heatmap web interface for big data genomics , 2017, bioRxiv.

[42]  Benjamin J. Raphael,et al.  Integrated genomic characterization of oesophageal carcinoma , 2017, Nature.

[43]  Roy H. Campbell,et al.  Toward Scalable Machine Learning and Data Mining: the Bioinformatics Case , 2017, ArXiv.

[44]  S. Sinha,et al.  An epithelial-mesenchymal-amoeboid transition gene signature reveals molecular subtypes of breast cancer progression and metastasis , 2017, bioRxiv.

[45]  Alex A. T. Bui,et al.  Envisioning the future of 'big data' biomedicine , 2017, J. Biomed. Informatics.

[46]  Adeeb Rahman,et al.  Clustergrammer, a web-based heatmap visualization and analysis tool for high-dimensional biological data , 2017, Scientific Data.

[47]  F. Arnaud,et al.  From core referencing to data re-use: two French national initiatives to reinforce paleodata stewardship (National Cyber Core Repository and LTER France Retro-Observatory) , 2017 .

[48]  L. Staudt,et al.  The NCI Genomic Data Commons as an engine for precision medicine. , 2017, Blood.

[49]  Mingming Jia,et al.  COSMIC: somatic cancer genetics at high-resolution , 2016, Nucleic Acids Res..

[50]  A. Sethi,et al.  The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research. , 2017, Cancer research.

[51]  Thawfeek M. Varusai,et al.  The Reactome Pathway Knowledgebase , 2017, Nucleic acids research.

[52]  Keith A. Boroevich,et al.  Open Community Challenge Reveals Molecular Network Modules with Key Roles in Diseases , 2019 .

[53]  Jinming Yu,et al.  Nrf2 and Keap1 abnormalities in esophageal squamous cell carcinoma and association with the effect of chemoradiotherapy , 2018, Thoracic cancer.

[54]  B. Langmead,et al.  Cloud computing for genomic data analysis and collaboration , 2018, Nature Reviews Genetics.

[55]  Liang Song,et al.  Overexpression of FOXM1 as a target for malignant progression of esophageal squamous cell carcinoma. , 2018, Oncology letters.

[56]  Donna K. Slonim,et al.  Open Community Challenge Reveals Molecular Network Modules with Key Roles in Diseases , 2018 .

[57]  Charles Blatti,et al.  Gene Sets Analysis using Network Patterns , 2019, bioRxiv.

[58]  Donna K. Slonim,et al.  Assessment of network module identification across complex diseases , 2019, Nature Methods.