OmicsMapNet: Transforming omics data to take advantage of Deep Convolutional Neural Network for discovery

We developed OmicsMapNet approach to take advantage of existing deep leaning frameworks to analyze high-dimensional omics data as 2-dimensional images. The omics data of individual samples were first rearranged into 2D images in which molecular features related in functions, ontologies, or other relationships were organized in spatially adjacent and patterned locations. Deep learning neural networks were trained to classify the images. Molecular features informative of classes of different phenotypes were subsequently identified. As an example, we used the KEGG BRITE database to rearrange RNA-Seq expression data of TCGA diffuse glioma samples as treemaps to capture the functional hierarchical structure of genes in 2D images. Deep Convolutional Neural Networks (CNN) were derived using tools from TensorFlow to learn the grade of TCGA LGG and GBM samples with relatively high accuracy. The most contributory features in the trained CNN were confirmed in pathway analysis for their plausible functional involvement.

[1]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[2]  Jing Wang,et al.  WebGestalt 2017: a more comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit , 2017, Nucleic Acids Res..

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  G. Reifenberger,et al.  The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary , 2016, Acta Neuropathologica.

[5]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[6]  D. Louis WHO classification of tumours of the central nervous system , 2007 .

[7]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[8]  Joshua LaBaer,et al.  Statistical design for biospecimen cohort size in proteomics-based biomarker discovery and verification studies. , 2013, Journal of proteome research.

[9]  Tao Wang,et al.  High-dimensional omics data analysis using a variable screening protocol with prior knowledge integration (SKI) , 2016, BMC Systems Biology.

[10]  William A Ricke,et al.  Biomarker discovery in mass spectrometry‐based urinary proteomics , 2016, Proteomics. Clinical applications.

[11]  Ben Shneiderman,et al.  Tree visualization with tree-maps: 2-d space-filling approach , 1992, TOGS.

[12]  Jean Ponce,et al.  A Theoretical Analysis of Feature Pooling in Visual Recognition , 2010, ICML.

[13]  B. Scheithauer,et al.  The 2007 WHO classification of tumours of the central nervous system , 2007, Acta Neuropathologica.

[14]  Daniela M Witten,et al.  Extensions of Sparse Canonical Correlation Analysis with Applications to Genomic Data , 2009, Statistical applications in genetics and molecular biology.

[15]  Gerald C. Chu,et al.  P53 and Pten control neural and glioma stem/progenitor cell renewal and differentiation , 2008, Nature.

[16]  Marc J. Williams,et al.  Identification of neutral tumor evolution across cancer types , 2016, Nature Genetics.

[17]  Bai Zhang,et al.  Network Biology in Medicine and Beyond , 2014, Circulation. Cardiovascular genetics.

[18]  Henning Hermjakob,et al.  The Reactome pathway knowledgebase , 2013, Nucleic Acids Res..

[19]  Matthias Mann,et al.  Revisiting biomarker discovery by plasma proteomics , 2017, Molecular systems biology.

[20]  Stefan Funke,et al.  Visualizing Gene Expression Data via Voronoi Treemaps , 2009, 2009 Sixth International Symposium on Voronoi Diagrams.

[21]  Daniel C. Liebler,et al.  Proteome Profiling Outperforms Transcriptome Profiling for Coexpression Based Gene Function Prediction* , 2016, Molecular & Cellular Proteomics.

[22]  B. Webb-Robertson,et al.  Challenges in Biomarker Discovery: Combining Expert Insights with Statistical Analysis of Complex Omics Data. , 2013, Expert opinion on medical diagnostics.

[23]  Henning Hermjakob,et al.  The Reactome pathway Knowledgebase , 2015, Nucleic acids research.

[24]  Sian Ellard,et al.  Homozygous Mutations in NEUROD1 Are Responsible for a Novel Syndrome of Permanent Neonatal Diabetes and Neurological Abnormalities , 2010, Diabetes.

[25]  Yichuan Tang,et al.  Deep Learning using Linear Support Vector Machines , 2013, 1306.0239.

[26]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[27]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[28]  Pornpimol Charoentong,et al.  ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks , 2009, Bioinform..

[29]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[30]  Allan H Friedman,et al.  Brain tumors in mice are susceptible to blockade of epidermal growth factor receptor (EGFR) with the oral, specific, EGFR-tyrosine kinase inhibitor ZD1839 (iressa). , 2002, Clinical cancer research : an official journal of the American Association for Cancer Research.

[31]  Ben Shneiderman,et al.  Ordered and quantum treemaps: Making effective use of 2D space to display hierarchies , 2002, TOGS.

[32]  Tian-Li Wang,et al.  Frequent Mutations of Chromatin Remodeling Gene ARID1A in Ovarian Clear Cell Carcinoma , 2010, Science.

[33]  Jeffrey R. Whiteaker,et al.  Clinical potential of mass spectrometry-based proteogenomics , 2018, Nature Reviews Clinical Oncology.

[34]  G. Smyth,et al.  Camera: a competitive gene set test accounting for inter-gene correlation , 2012, Nucleic acids research.

[35]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[36]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[37]  S. Choi,et al.  RGS16 and FosB underexpressed in pancreatic cancer with lymph node metastasis promote tumor progression , 2010, Tumor Biology.

[38]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[39]  Eckart Zitzler,et al.  BicAT: a biclustering analysis toolbox , 2006, Bioinform..

[40]  Robert Clarke,et al.  Differential dependency network analysis to identify condition-specific topological changes in biological networks , 2009, Bioinform..

[41]  Ben Shneiderman,et al.  Visualization and analysis of microarray and gene ontology data with treemaps , 2004, BMC Bioinformatics.

[42]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[44]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[45]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[48]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[49]  Kim-Anh Lê Cao,et al.  DIABLO - an integrative, multi-omics, multivariate method for multi-group classification , 2017 .

[50]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[51]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[52]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[53]  ImageNet Classification with Deep Convolutional Neural , 2013 .

[54]  Andrew S. Greene,et al.  Visualizing Quantitative Proteomics Datasets using Treemaps , 2007, 2007 11th International Conference Information Visualization (IV '07).

[55]  J. Fletcher,et al.  ABC transporters in cancer: more than just drug efflux pumps , 2010, Nature Reviews Cancer.

[56]  Avi Ma'ayan,et al.  Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool , 2013, BMC Bioinformatics.

[57]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[59]  J. Weinstein,et al.  Biomarkers in Cancer Staging, Prognosis and Treatment Selection , 2005, Nature Reviews Cancer.

[60]  Alexander R. Pico,et al.  Glioma Groups Based on 1p/19q, IDH, and TERT Promoter Mutations in Tumors. , 2015, The New England journal of medicine.

[61]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[62]  Steven J. M. Jones,et al.  Molecular Profiling Reveals Biologically Discrete Subsets and Pathways of Progression in Diffuse Glioma , 2016, Cell.

[63]  Jing Wang,et al.  WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013 , 2013, Nucleic Acids Res..

[64]  Anubhav Srivastava,et al.  Discovery and Validation of Clinical Biomarkers of Cancer: A Review Combining Metabolomics and Proteomics , 2018, Proteomics.

[65]  Sabine Riethdorf,et al.  FosB is Highly Expressed in Normal Mammary Epithelia, but Down-Regulated in Poorly Differentiated Breast Carcinomas , 2003, Breast Cancer Research and Treatment.

[66]  D. Wishart Emerging applications of metabolomics in drug discovery and precision medicine , 2016, Nature Reviews Drug Discovery.