Multi-omics Pathways Workflow (MOPAW): An Automated Multi-omics Workflow on the Cancer Genomics Cloud

Introduction: In the era of big data, gene-set pathway analyses derived from multi-omics are exceptionally powerful. When preparing and analyzing high-dimensional multi-omics data, the installation process and programing skills required to use existing tools can be challenging. This is especially the case for those who are not familiar with coding. In addition, implementation with high performance computing solutions is required to run these tools efficiently. Methods: We introduce an automatic multi-omics pathway workflow, a point and click graphical user interface to Multivariate Single Sample Gene Set Analysis (MOGSA), hosted on the Cancer Genomics Cloud by Seven Bridges Genomics. This workflow leverages the combination of different tools to perform data preparation for each given data types, dimensionality reduction, and MOGSA pathway analysis. The Omics data includes copy number alteration, transcriptomics data, proteomics and phosphoproteomics data. We have also provided an additional workflow to help with downloading data from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium and preprocessing these data to be used for this multi-omics pathway workflow. Results: The main outputs of this workflow are the distinct pathways for subgroups of interest provided by users, which are displayed in heatmaps if identified. In addition to this, graphs and tables are provided to users for reviewing. Conclusion: Multi-omics Pathway Workflow requires no coding experience. Users can bring their own data or download and preprocess public datasets from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium using our additional workflow based on the samples of interest. Distinct overactivated or deactivated pathways for groups of interest can be found. This useful information is important in effective therapeutic targeting.

[1]  Daoud M. Meerzaman,et al.  Proteogenomic analysis of lung adenocarcinoma reveals tumor heterogeneity, survival determinants, and therapeutically relevant pathways , 2022, Cell reports. Medicine.

[2]  Antonio Núñez Galindo,et al.  Assessing normalization methods in mass spectrometry-based proteome profiling of clinical samples , 2022, Biosyst..

[3]  A. Alkhateeb,et al.  Multi-omics Data Integration Model Based on UMAP Embedding and Convolutional Neural Network , 2022, Cancer informatics.

[4]  J. Pepper,et al.  Molecular Characterization of the Highest Risk Adult Patients With Acute Myeloid Leukemia (AML) Through Multi-Omics Clustering , 2021, Frontiers in Genetics.

[5]  Thomas Yu,et al.  DreamAI: algorithm for the imputation of proteomics data , 2020, bioRxiv.

[6]  Luis Rueda,et al.  iSOM-GSN: An Integrative Approach for Transforming Multi-omic Data into Gene Similarity Networks via Self-organizing Maps , 2020, Bioinform..

[7]  Shiva Kumar,et al.  Multi-omics Data Integration, Interpretation, and Its Application , 2020, Bioinformatics and biology insights.

[8]  Abhinav Nellore,et al.  Cloud computing for genomic data analysis and collaboration , 2018, Nature Reviews Genetics.

[9]  Yuan Ji,et al.  TCGA-Assembler 2: Software Pipeline for Retrieval and Processing of TCGA/CPTAC Data , 2017, bioRxiv.

[10]  A. Sethi,et al.  The Cancer Genomics Cloud: Collaborative, Reproducible, and Democratized-A New Paradigm in Large-Scale Computational Research. , 2017, Cancer research.

[11]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[12]  Aedín C. Culhane,et al.  MOGSA: Integrative Single Sample Gene-set Analysis of Multiple Omics Data , 2016, Molecular & Cellular Proteomics.

[13]  Laurent Gatto,et al.  Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. , 2016, Journal of proteome research.

[14]  Yuri A. Mirokhin,et al.  A Description of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) Common Data Analysis Pipeline. , 2016, Journal of proteome research.

[15]  Gianluca Bontempi,et al.  TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data , 2015, Nucleic acids research.

[16]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[17]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[18]  K. Cibulskis,et al.  Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. , 2012, The Journal of clinical investigation.

[19]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.