Integrating genomic data and pathological images to effectively predict breast cancer clinical outcome

BACKGROUND AND OBJECTIVE Breast cancer is a leading cause of death from cancer for females. The high mortality rate of breast cancer is largely due to the complexity among invasive breast cancer and its significantly varied clinical outcomes. Therefore, improving the accuracy of breast cancer survival prediction has important significance and becomes one of the major research areas. Nowadays many computational models have been proposed for breast cancer survival prediction, however, most of them generate the predictive models by employing only the genomic data information and few of them consider the complementary information from pathological images. METHODS In our study, we introduce a novel method called GPMKL based on multiple kernel learning (MKL), which efficiently employs heterogeneous information containing genomic data (gene expression, copy number alteration, gene methylation, protein expression) and pathological images. With above heterogeneous features, GPMKL is proposed to execute feature fusion which is embedded in breast cancer classification. RESULTS Performance analysis of the GPMKL model indicates that the pathological image information plays a critical part in accurately predicting the survival time of breast cancer patients. Furthermore, the proposed method is compared with other existing breast cancer survival prediction methods, and the results demonstrate that the proposed framework with pathological images performs remarkably better than the existing survival prediction methods. CONCLUSIONS All results performed in our study suggest that the usefulness and superiority of GPMKL in predicting human breast cancer survival.

[1]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[2]  Mark T. W. Ebbert,et al.  Agreement in risk prediction between the 21-gene recurrence score assay (Oncotype DX®) and the PAM50 breast cancer intrinsic Classifier™ in early-stage estrogen receptor-positive breast cancer. , 2012, The oncologist.

[3]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[4]  Matthias Schmid,et al.  Boosting the Concordance Index for Survival Data – A Unified Framework To Derive and Evaluate Biomarker Combinations , 2013, PloS one.

[5]  Junzhou Huang,et al.  Lung cancer survival prediction from pathological images and genetic data — An integration study , 2016, 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI).

[6]  Anne E Carpenter,et al.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes , 2006, Genome Biology.

[7]  Anne-Laure Boulesteix,et al.  Microarray-based classification and clinical predictors: on combined classifiers and additional predictive value , 2008, Bioinform..

[8]  Florentina Bunea,et al.  ENCAPP: elastic-net-based prognosis prediction and biomarker discovery for human cancers , 2015, BMC Genomics.

[9]  Ce Zhang,et al.  Predicting non-small cell lung cancer prognosis by fully automated microscopic pathology image features , 2016, Nature Communications.

[10]  Ruey-Feng Chang,et al.  Computer-aided prediction of axillary lymph node status in breast cancer using tumor surrounding tissue features in ultrasound images , 2017, Comput. Methods Programs Biomed..

[11]  Ao Li,et al.  Discovering Recurrent Copy Number Aberrations in Complex Patterns via Non-Negative Sparse Singular Value Decomposition , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Yi Shen,et al.  Prediction of protein kinase-specific phosphorylation sites in hierarchical structure using functional information and random forest , 2014, Amino Acids.

[13]  Junzhou Huang,et al.  Computer-Assisted Diagnosis of Lung Cancer Using Quantitative Topology Features , 2015, MLMI.

[14]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[16]  Casey S. Greene,et al.  ADAGE analysis of publicly available gene expression data collections illuminates Pseudomonas aeruginosa-host interactions , 2015, bioRxiv.

[17]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[18]  D. Hanahan,et al.  The Hallmarks of Cancer , 2000, Cell.

[19]  C. Rueden,et al.  Metadata matters: access to image data in the real world , 2010, The Journal of cell biology.

[20]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[21]  Chen Peng,et al.  Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Hai Su,et al.  Novel image markers for non-small cell lung cancer classification and survival prediction , 2014, BMC Bioinformatics.

[23]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[24]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[25]  Xiang-Sun Zhang,et al.  Breast tumor subgroups reveal diverse clinical prognostic power , 2014, Scientific Reports.

[26]  C. Caldas,et al.  Molecular classification and molecular forecasting of breast cancer: ready for clinical application? , 2005, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[27]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[28]  Li Liu,et al.  Improved breast cancer prognosis through the combination of clinical and genetic markers , 2007, Bioinform..

[29]  Jin Gu,et al.  Evaluating the molecule-based prediction of clinical drug responses in cancer , 2016, Bioinform..

[30]  H. Joensuu,et al.  Artificial Neural Networks Applied to Survival Prediction in Breast Cancer , 1999, Oncology.

[31]  Jacques Ferlay,et al.  Global Burden of Breast Cancer , 2010 .

[32]  Anne E Carpenter,et al.  Improved structure, function and compatibility for CellProfiler: modular high-throughput image analysis software , 2011, Bioinform..

[33]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[34]  M. Tevfik Dorak,et al.  Gender Differences in Cancer Susceptibility: An Inadequately Addressed Issue , 2012, Front. Gene..

[35]  Ya Zhang,et al.  A gene signature for breast cancer prognosis using support vector machine , 2012, 2012 5th International Conference on BioMedical Engineering and Informatics.

[36]  Nedialko S. Nedialkov,et al.  Probabilistic Graphical Models and Deep Belief Networks for Prognosis of Breast Cancer , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[37]  Bo Yao,et al.  PhosphoSVM: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine , 2014, Amino Acids.

[38]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[39]  Gianluca Bontempi,et al.  Biological Processes Associated with Breast Cancer Clinical Outcome Depend on the Molecular Subtypes , 2008, Clinical Cancer Research.

[40]  R. Greil,et al.  Predicting distant recurrence in receptor-positive breast cancer patients with limited clinicopathological risk: using the PAM50 Risk of Recurrence score in 1478 postmenopausal patients of the ABCSG-8 trial treated with adjuvant endocrine therapy alone. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[41]  H. Zou,et al.  A cocktail algorithm for solving the elastic net penalized Cox’s regression in high dimensions , 2013 .

[42]  M. Zuraek,et al.  Gender Differences in Breast Cancer: Analysis of 13,000 Breast Cancers in Men from the National Cancer Data Base , 2012, Annals of Surgical Oncology.

[43]  M. Kloft,et al.  l p -Norm Multiple Kernel Learning , 2011 .

[44]  G. Ball,et al.  High‐throughput protein expression analysis using tissue microarray technology of a large well‐characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses , 2005, International journal of cancer.

[45]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[46]  Ao Li,et al.  A novel heterogeneous network-based method for drug response prediction in cancer cell lines , 2018, Scientific Reports.

[47]  Jon Atli Benediktsson,et al.  A Novel MKL Model of Integrating LiDAR Data and MSI for Urban Area Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[48]  Bart De Moor,et al.  Predicting the prognosis of breast cancer by integrating clinical and microarray data with Bayesian networks , 2006, ISMB.

[49]  Jin Tae Kwak,et al.  Multiview boosting digital pathology analysis of prostate cancer , 2017, Comput. Methods Programs Biomed..

[50]  D. Cox Regression Models and Life-Tables , 1972 .