Extracting Pathway-level Signatures from Proteogenomic Data in Breast Cancer Using Independent Component Analysis*

Independent component analysis was applied to human breast cancer proteogenomic data, and pathway-level signatures were further integrated with clinical information. Our results demonstrated that ICA can be used to extract biological relevant signals from multi-omics data in an unsupervised manner. Graphical Abstract Highlights Unsupervised feature extraction from proteogenomics data. Pathway level integration of multi-omics data based on clinical features. Recent advances in the multi-omics characterization necessitate knowledge integration across different data types that go beyond individual biomarker discovery. In this study, we apply independent component analysis (ICA) to human breast cancer proteogenomics data to retrieve mechanistic information. We show that as an unsupervised feature extraction method, ICA was able to construct signatures with known biological relevance on both transcriptome and proteome levels. Moreover, proteome and transcriptome signatures can be associated by their respective correlation with patient clinical features, providing an integrated description of phenotype-related biological processes. Our results demonstrate that the application of ICA to proteogenomics data could lead to pathway-level knowledge discovery. Potential extension of this approach to other data and cancer types may contribute to pan-cancer integration of multi-omics information.

[1]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[2]  K. Cibulskis,et al.  Prognostically relevant gene signatures of high-grade serous ovarian carcinoma. , 2012, The Journal of clinical investigation.

[3]  R. Schüle,et al.  Lysine-specific demethylase 1 (LSD1) is highly expressed in ER-negative breast cancers and a biomarker predicting aggressive biology. , 2010, Carcinogenesis.

[4]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[5]  The benefit of HER2-targeted therapies on overall survival of patients with metastatic HER2-positive breast cancer – a systematic review , 2015, Breast Cancer Research.

[6]  Michael L. Gatza,et al.  Proteogenomics connects somatic mutations to signaling in breast cancer , 2016, Nature.

[7]  David Fenyö,et al.  Breast Cancer Prognostics Using Multi-Omics Data , 2016, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[8]  Aapo Hyvärinen,et al.  Fast and robust fixed-point algorithms for independent component analysis , 1999, IEEE Trans. Neural Networks.

[9]  Ronald J. Moore,et al.  Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer , 2016, Cell.

[10]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[11]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[12]  R. Gelber,et al.  Tailoring therapies—improving the management of early breast cancer: St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2015 , 2015, Annals of oncology : official journal of the European Society for Medical Oncology.

[13]  A. Jeyasekharan,et al.  LSD1 Overexpression Is Associated with Poor Prognosis in Basal-Like Breast Cancer, and Sensitivity to PARP Inhibition , 2015, PloS one.

[14]  Anthony B. Miller,et al.  Why have breast cancer mortality rates declined , 2015 .

[15]  Russ B. Altman,et al.  Independent component analysis: Mining microarray data for fundamental human gene expression modules , 2010, J. Biomed. Informatics.

[16]  Jeffrey R. Whiteaker,et al.  Proteogenomic characterization of human colon and rectal cancer , 2014, Nature.

[17]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[18]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[19]  Parag Kulkarni,et al.  Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms , 2013 .

[20]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  P. Gestraud,et al.  Independent component analysis uncovers the landscape of the bladder tumor transcriptome and reveals insights into luminal and basal subtypes. , 2014, Cell reports.

[22]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[23]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[24]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[25]  Aapo Hyvärinen,et al.  Independent component analysis: recent advances , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[26]  Aapo Hyvärinen,et al.  Validating the independent components of neuroimaging time series via clustering and visualization , 2004, NeuroImage.

[27]  Andrei Zinovyev,et al.  Determining the optimal number of independent components for reproducible transcriptomic data analysis , 2017, BMC Genomics.

[28]  David Lindgren,et al.  Independent component analysis reveals new and biologically significant structures in micro array data , 2006, BMC Bioinformatics.