Combining Pathway Identification and Breast Cancer Survival Prediction via Screening-Network Methods

Breast cancer is one of the most common invasive tumors causing high mortality among women. It is characterized by high heterogeneity regarding its biological and clinical characteristics. Several high-throughput assays have been used to collect genome-wide information for many patients in large collaborative studies. This knowledge has improved our understanding of its biology and led to new methods of diagnosing and treating the disease. In particular, system biology has become a valid approach to obtain better insights into breast cancer biological mechanisms. A crucial component of current research lies in identifying novel biomarkers that can be predictive for breast cancer patient prognosis on the basis of the molecular signature of the tumor sample. However, the high dimension and low sample size of data greatly increase the difficulty of cancer survival analysis demanding for the development of ad-hoc statistical methods. In this work, we propose novel screening-network methods that predict patient survival outcome by screening key survival-related genes and we assess the capability of the proposed approaches using METABRIC dataset. In particular, we first identify a subset of genes by using variable screening techniques on gene expression data. Then, we perform Cox regression analysis by incorporating network information associated with the selected subset of genes. The novelty of this work consists in the improved prediction of survival responses due to the different types of screenings (i.e., a biomedical-driven, data-driven and a combination of the two) before building the network-penalized model. Indeed, the combination of the two screening approaches allows us to use the available biological knowledge on breast cancer and complement it with additional information emerging from the data used for the analysis. Moreover, we also illustrate how to extend the proposed approaches to integrate an additional omic layer, such as copy number aberrations, and we show that such strategies can further improve our prediction capabilities. In conclusion, our approaches allow to discriminate patients in high-and low-risk groups using few potential biomarkers and therefore, can help clinicians to provide more precise prognoses and to facilitate the subsequent clinical management of patients at risk of disease.

[1]  Holger Fröhlich,et al.  Towards clinically more relevant dissection of patient heterogeneity via survival‐based Bayesian clustering , 2017, Bioinform..

[2]  D. Adams,et al.  Revisiting olfactory receptors as putative drivers of cancer , 2017, Wellcome open research.

[3]  S. Benkovic,et al.  A New View into the Regulation of Purine Metabolism: The Purinosome. , 2017, Trends in biochemical sciences.

[4]  Ramana V. Davuluri,et al.  Identification of Genetic and Epigenetic Variants Associated with Breast Cancer Prognosis by Integrative Bioinformatics Analysis , 2017, Cancer informatics.

[5]  Claudia Angelini,et al.  Cancer Markers Selection Using Network-Based Cox Regression: A Methodological and Computational Practice , 2016, Front. Physiol..

[6]  A. Ignatov,et al.  Expression of transmembrane protein 26 (TMEM26) in breast cancer and its association with drug response , 2016, Oncotarget.

[7]  Kai Hung Tiong,et al.  Fibroblast growth factor receptor 4 (FGFR4) and fibroblast growth factor 19 (FGF19) autocrine enhance breast cancer cells survival , 2016, Oncotarget.

[8]  Ke Deng,et al.  High-dimensional genomic data bias correction and data integration using MANCIE , 2016, Nature Communications.

[9]  Nathan E. Lewis,et al.  Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis , 2016, Genome Medicine.

[10]  L. Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[11]  S. Pineda,et al.  Integration Analysis of Three Omics Data Using Penalized Regression Methods: An Application to Bladder Cancer , 2015, PLoS genetics.

[12]  H. Jakubowski,et al.  Homocysteine thiolactone and N-homocysteinylated protein induce pro-atherogenic changes in gene expression in human vascular endothelial cells , 2015, Amino Acids.

[13]  M. Ritchie,et al.  Methods of integrating data to uncover genotype–phenotype interactions , 2015, Nature Reviews Genetics.

[14]  Shuangge Ma,et al.  Censored Rank Independence Screening for High-dimensional Survival Data. , 2014, Biometrika.

[15]  Claudia Angelini,et al.  Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems , 2014, Front. Cell Dev. Biol..

[16]  J. Yates,et al.  Plasma Membrane Proteomics of Human Breast Cancer Cell Lines Identifies Potential Targets for Breast Cancer Diagnosis and Treatment , 2014, PloS one.

[17]  Rui Feng,et al.  NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA. , 2014, Statistica Sinica.

[18]  Holger Fröhlich,et al.  Including network knowledge into Cox regression models for biomarker signature discovery , 2014, Biometrical journal. Biometrische Zeitschrift.

[19]  Matthias Dehmer,et al.  The gene regulatory network for breast cancer: integrated regulatory landscape of cancer hallmarks , 2014, Front. Genet..

[20]  Edmund M. Clarke,et al.  Pathway-gene identification for pancreatic cancer survival via doubly regularized Cox regression , 2014, BMC Systems Biology.

[21]  I. Leray,et al.  Promotion of Cancer Cell Invasiveness and Metastasis Emergence Caused by Olfactory Receptor Stimulation , 2014, PloS one.

[22]  Ying-Wooi Wan,et al.  Network-Based Identification of Biomarkers Coexpressed with Multiple Pathways , 2014, Cancer informatics.

[23]  David J. Galas,et al.  RCytoscape: tools for exploratory network analysis , 2013, BMC Bioinformatics.

[24]  Steven J. M. Jones,et al.  Integrated genomic characterization of endometrial carcinoma , 2013, Nature.

[25]  Benjamin E. Gross,et al.  Integrative Analysis of Complex Cancer Genomics and Clinical Profiles Using the cBioPortal , 2013, Science Signaling.

[26]  Baolin Wu,et al.  Network-based Survival Analysis Reveals Subnetwork Signatures for Predicting Outcomes of Ovarian Cancer Treatment , 2013, PLoS Comput. Biol..

[27]  Y. Nakaya,et al.  Exendin-4, a glucagon-like peptide-1 receptor agonist, attenuates neointimal hyperplasia after vascular injury. , 2013, European journal of pharmacology.

[28]  Steven J. M. Jones,et al.  Comprehensive molecular characterization of human colon and rectal cancer , 2012, Nature.

[29]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[30]  Yi Li,et al.  Principled sure independence screening for Cox models with ultra-high-dimensional covariates , 2012, J. Multivar. Anal..

[31]  Yichao Wu ELASTIC NET FOR COX'S PROPORTIONAL HAZARDS MODEL WITH A SOLUTION PATH ALGORITHM. , 2012, Statistica Sinica.

[32]  Runze Li,et al.  Model-Free Feature Screening for Ultrahigh-Dimensional Data , 2011, Journal of the American Statistical Association.

[33]  J. Guan,et al.  Focal adhesion kinase and its signaling pathways in cell migration and angiogenesis. , 2011, Advanced drug delivery reviews.

[34]  F. Pontén,et al.  Functional and prognostic relevance of the homeobox protein MSX2 in malignant melanoma , 2011, British Journal of Cancer.

[35]  Benjamin J. Raphael,et al.  Integrated Genomic Analyses of Ovarian Carcinoma , 2011, Nature.

[36]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[37]  I. Dialsingh Multiple testing problems in pharmaceutical statistics , 2011 .

[38]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[39]  Michael Jones,et al.  Novel breast cancer susceptibility locus at 9q31.2: results of a genome-wide association study. , 2011, Journal of the National Cancer Institute.

[40]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[41]  Yang Feng,et al.  Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models , 2009, Journal of the American Statistical Association.

[42]  U Schumacher,et al.  Regulation of the NRSF/REST gene by methylation and CREB affects the cellular phenotype of small-cell lung cancer , 2010, Oncogene.

[43]  E. Choi,et al.  Pathological roles of MAPK signaling pathways in human diseases. , 2010, Biochimica et biophysica acta.

[44]  H. Katayama,et al.  Estrogen-Induced Aurora Kinase-A (AURKA) Gene Expression is Activated by GATA-3 in Estrogen Receptor-Positive Breast Cancer Cells , 2010, Hormones & cancer.

[45]  Yang Feng,et al.  High-dimensional variable selection for Cox's proportional hazards model , 2010, 1002.3315.

[46]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[47]  Yichao Wu,et al.  Ultrahigh Dimensional Feature Selection: Beyond The Linear Model , 2009, J. Mach. Learn. Res..

[48]  Matthew A. Hibbs,et al.  Exploring the human genome with functional maps. , 2009, Genome research.

[49]  T. Hirano,et al.  Intracellular zinc homeostasis and zinc signaling , 2008, Cancer science.

[50]  C. Robert Discussion of "Sure independence screening for ultra-high dimensional feature space" by Fan and Lv. , 2008 .

[51]  Wan-Wan Lin,et al.  A cytokine-mediated link between innate immunity, inflammation, and cancer. , 2007, The Journal of clinical investigation.

[52]  E. Candès,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[53]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[54]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[55]  L. V. van't Veer,et al.  Cross‐validated Cox regression on microarray gene expression data , 2006, Statistics in medicine.

[56]  J. Guan,et al.  Mechanisms of focal adhesion kinase regulation. , 2005, Current cancer drug targets.

[57]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[58]  Jiri Bartek,et al.  Cell-cycle checkpoints and cancer , 2004, Nature.

[59]  J. Parsons,et al.  Focal adhesion kinase: the first ten years , 2003, Journal of Cell Science.

[60]  J. McCubrey,et al.  Involvement of PI3K/Akt pathway in cell cycle progression, apoptosis, and neoplastic transformation: a target for cancer chemotherapy , 2003, Leukemia.

[61]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[62]  M. Piccart,et al.  The contribution of molecular markers to the prediction of response in the treatment of breast cancer: a review of the literature on HER-2, p53 and BCL-2. , 2000, Annals of oncology : official journal of the European Society for Medical Oncology.

[63]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[64]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[65]  H. Joensuu,et al.  Bcl-2 protein expression and long-term survival in breast cancer. , 1994, The American journal of pathology.

[66]  J. Partanen,et al.  Amplification of fgfr4 gene in human breast and gynecological cancers , 1993, International journal of cancer.

[67]  G Weber,et al.  Enzymes of purine metabolism in cancer. , 1983, Clinical biochemistry.

[68]  D.,et al.  Regression Models and Life-Tables , 2022 .