Biologically inspired survival analysis based on integrating gene expression as mediator with genomic variants

Accurately linking cancer molecular profiling with survival can lead to improvements in the clinical management of cancer. However, existing survival analysis relies on statistical evidence from a single level of data, without paying much attention to the integration of interacting multi-level data and the underlying biology. Advances in genomic techniques provide unprecedented power of characterizing the cancer tissue in a more complete manner than before, offering the opportunity to design biologically informed and integrative approaches for survival data analysis. Human cancer is characterized by somatic copy number alternation and unique gene expression profiles. However, it remains largely unclear how to integrate the gene expression and genetic variant data to achieve a better prediction of patient survival and an improved understanding of disease progression. Consistent with the biological hierarchy from DNA to RNA, we prioritize each survival-relevant feature with two separate scores, predictive and mechanistic. For mRNA expression levels, predictive features are those mRNAs whose variation in expression levels is associated with survival outcome, and mechanistic features are those mRNAs whose variation in expression levels is associated with genomic variants. Further, we simultaneously integrate information from both the predictive model and the mechanistic model through our new approach, GEMPS (Gene Expression as a Mediator for Predicting Survival). Applied on two cancer types (ovarian and glioblastoma multiforme), our method achieved better prediction power (p-value: 6.18E-03-5.15E-11) than peer methods (GE.CNAs and GE.CNAs. Lasso). Gene set enrichment analysis confirms that the genes utilized for the final survival analysis are biologically important and relevant.

[1]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Qing-Rong Chen,et al.  Systematic Genetic Analysis Identifies Cis-eQTL Target Genes Associated with Glioblastoma Patient Survival , 2014, PloS one.

[4]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[5]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[6]  J. Lupski Structural variation in the human genome. , 2007, The New England journal of medicine.

[7]  Brooke L. Fridley,et al.  Platinum Sensitivity–Related Germline Polymorphism Discovered via a Cell-Based Approach and Analysis of Its Association with Outcome in Ovarian Cancer Patients , 2011, Clinical Cancer Research.

[8]  Qing Zhao,et al.  Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA , 2015, Briefings Bioinform..

[9]  Ståle Nygård,et al.  Partial least squares Cox regression for genome-wide data , 2008, Lifetime data analysis.

[10]  G. Getz,et al.  GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers , 2011, Genome Biology.

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  R. Hubbard,et al.  Treatment decisions and survival for people with small-cell lung cancer , 2014, British Journal of Cancer.

[13]  D.,et al.  Regression Models and Life-Tables , 2022 .

[14]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .

[15]  Colin Campbell,et al.  A pathway-based data integration framework for prediction of disease progression , 2013, Bioinform..

[16]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[17]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[18]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[19]  J. Coresh,et al.  Adjusting survival curves for confounders: a review and a new method. , 1996, American journal of epidemiology.

[20]  R. Nusse,et al.  Wnt proteins. , 2012, Cold Spring Harbor perspectives in biology.

[21]  Sung-Liang Yu,et al.  MicroRNA signature predicts survival and relapse in lung cancer. , 2008, Cancer cell.

[22]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[23]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[24]  Krzysztof J. Szkop,et al.  Multiple sources of bias confound functional enrichment analysis of global -omics data , 2015, Genome Biology.

[25]  Adam A. Margolin,et al.  Assessing the clinical utility of cancer genomic and proteomic data across tumor types , 2014, Nature Biotechnology.

[26]  Towfique Raj,et al.  Genetics of human gene expression. , 2013, Current opinion in genetics & development.

[27]  Jeffrey S. Morris,et al.  iBAG: integrative Bayesian analysis of high-dimensional multiplatform genomics data , 2012, Bioinform..

[28]  Arnoldo Frigessi,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm305 Gene expression Predicting survival from microarray data—a comparative study , 2022 .

[29]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Stephen B. Montgomery,et al.  Cis and Trans Effects of Human Genomic Variants on Gene Expression , 2014, PLoS genetics.

[31]  Rosemary Braun,et al.  Discovery Analysis of TCGA Data Reveals Association between Germline Genotype and Survival in Ovarian Cancer Patients , 2013, PloS one.

[32]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[33]  Wenyi Wang,et al.  Integrating multi-platform genomic data using hierarchical Bayesian relevance vector machines , 2012, GENSiPS.

[34]  Vivian G. Cheung,et al.  Genetics of human gene expression: mapping DNA variants that influence gene expression , 2009, Nature Reviews Genetics.