MCNF: A Novel Method for Cancer Subtyping by Integrating Multi-Omics and Clinical Data

In the age of personalized medicine, there is a great need to classify cancer (from the same organ site) into homogeneous subtypes. Recent technology advancements in genome-wide molecular profiling have made it possible to profiling multiple molecular datasets to characterize the genomic changes in various cancer types. How to take full advantage of the availability of these omics data? And how to integrate these molecular data with patient clinical data to do a more systematic subtyping of cancer are the focuses of the paper. We proposed a new method called Molecular and Clinical Networks Fusion (MCNF) to classify cancer into homogeneous subtypes. Our method has two highlights: one is that it can integrate both numerical and non-numerical data into the fused network; the next highlight is that it is unsupervised, which means it can automatically determine the optimal number of clusters.

[1]  Xing-Ming Zhao,et al.  Identifying cancer-related microRNAs based on gene expression data , 2015, Bioinform..

[2]  Stefanie Seiler,et al.  Finding Groups In Data , 2016 .

[3]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[4]  Genevera I. Allen,et al.  TCGA2STAT: simple TCGA data access for integrated statistical analysis in R , 2016, Bioinform..

[5]  Avi Ma ' ayan,et al.  Introduction to Network Analysis in Systems Biology , 2011 .

[6]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[7]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[8]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[9]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[10]  Brooke L. Fridley,et al.  InterSIM: Simulation tool for multiple integrative 'omic datasets' , 2016, Comput. Methods Programs Biomed..

[11]  Simon C. K. Shiu,et al.  Molecular Pattern Discovery Based on Penalized Matrix Decomposition , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Shraddha K. Popat Review and Comparative Study of Clustering Techniques , 2014 .

[13]  C. Mathers,et al.  Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012 , 2015, International journal of cancer.

[14]  Hongya Zhao,et al.  Gene expression profiling of 1200 pancreatic ductal adenocarcinoma reveals novel subtypes , 2018, BMC Cancer.

[15]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[16]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[17]  Wen Jiang,et al.  Random Walk-Based Solution to Triple Level Stochastic Point Location Problem , 2016, IEEE Transactions on Cybernetics.

[18]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[19]  P. Rousseeuw,et al.  Partitioning Around Medoids (Program PAM) , 2008 .

[20]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[21]  Yu Cao,et al.  Intratumor heterogeneity in localized lung adenocarcinomas delineated by multiregion sequencing , 2014, Science.

[22]  P. Spellman,et al.  Subtypes of Pancreatic Ductal Adenocarcinoma and Their Differing Responses to Therapy , 2011, Nature Medicine.

[23]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[24]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[25]  Xing-Ming Zhao,et al.  Integrative analysis of mutational and transcriptional profiles reveals driver mutations of metastatic breast cancers , 2016, Cell Discovery.

[26]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[27]  B. Faubert,et al.  Metabolic Heterogeneity in Human Lung Tumors , 2016, Cell.

[28]  Bernhard Schölkopf,et al.  A Local Learning Approach for Clustering , 2006, NIPS.

[29]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[30]  Santo Fortunato,et al.  Community detection in networks: A user guide , 2016, ArXiv.

[31]  Jeffrey S. Morris,et al.  The Consensus Molecular Subtypes of Colorectal Cancer , 2015, Nature Medicine.

[32]  Eric F Lock,et al.  JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. , 2011, The annals of applied statistics.

[33]  Woo Yong Lee,et al.  Tumor Heterogeneity Predicts Metastatic Potential in Colorectal Cancer , 2017, Clinical Cancer Research.

[34]  Zoubin Ghahramani,et al.  Bayesian correlated clustering to integrate multiple datasets , 2012, Bioinform..

[35]  S. Gabriel,et al.  Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. , 2010, Cancer cell.

[36]  Purnamrita Sarkar,et al.  Random Walks in Social Networks and their Applications: A Survey , 2011, Social Network Data Analytics.

[37]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[38]  Zhu-Hong You,et al.  t-LSE: A Novel Robust Geometric Approach for Modeling Protein-Protein Interaction Networks , 2013, PloS one.

[39]  Junfeng Xia,et al.  Cancer Subtype Discovery Based on Integrative Model of Multigenomic Data , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Geoffrey Grimmett RandomWalks on Graphs , 2018 .

[41]  Aidong Zhang,et al.  Integrate multi-omic data using affinity network fusion (ANF) for cancer patient clustering , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[42]  Masahiro Inoue,et al.  The consensus molecular subtypes of colorectal cancer as a predictive factor for chemotherapies against metastatic colorectal cancer. , 2018 .

[43]  Avi Ma’ayan Introduction to Network Analysis in Systems Biology , 2011, Science Signaling.

[44]  Hong Yan,et al.  Molecular subtyping of cancer: current status and moving toward clinical applications , 2019, Briefings Bioinform..

[45]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[46]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[47]  Carlos Caldas,et al.  Genetic heterogeneity in breast cancer: the road to personalized medicine? , 2013, BMC Medicine.

[48]  C. Sander,et al.  Pattern discovery and cancer gene identification in integrated cancer genomic data , 2013, Proceedings of the National Academy of Sciences.