A new semi-supervised learning model combined with Cox and SP-AFT models in cancer survival analysis

Gene selection is an attractive and important task in cancer survival analysis. Most existing supervised learning methods can only use the labeled biological data, while the censored data (weakly labeled data) far more than the labeled data are ignored in model building. Trying to utilize such information in the censored data, a semi-supervised learning framework (Cox-AFT model) combined with Cox proportional hazard (Cox) and accelerated failure time (AFT) model was used in cancer research, which has better performance than the single Cox or AFT model. This method, however, is easily affected by noise. To alleviate this problem, in this paper we combine the Cox-AFT model with self-paced learning (SPL) method to more effectively employ the information in the censored data in a self-learning way. SPL is a kind of reliable and stable learning mechanism, which is recently proposed for simulating the human learning process to help the AFT model automatically identify and include samples of high confidence into training, minimizing interference from high noise. Utilizing the SPL method produces two direct advantages: (1) The utilization of censored data is further promoted; (2) the noise delivered to the model is greatly decreased. The experimental results demonstrate the effectiveness of the proposed model compared to the traditional Cox-AFT model.

[1]  Q. Cui,et al.  Identification of high-quality cancer prognostic markers and metastasis network modules , 2010, Nature communications.

[2]  Zhi-Hua Zhou,et al.  New Semi-Supervised Classification Method Based on Modified Cluster Assumption , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Edwin Wang,et al.  Network Analysis Reveals A Signaling Regulatory Loop in the PIK3CA-mutated Breast Cancer Predicting Survival Outcome , 2017, Genom. Proteom. Bioinform..

[4]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[5]  Xiaoqian Jiang,et al.  Protecting genomic data analytics in the cloud: state of the art and opportunities , 2016, BMC Medical Genomics.

[6]  Edwin Wang,et al.  Signaling network analysis of ubiquitin-mediated proteins suggests correlations between the 26S proteasome and tumor progression. , 2009, Molecular bioSystems.

[7]  W. Pardridge,et al.  Drug and gene targeting to the brain with molecular trojan horses , 2002, Nature Reviews Drug Discovery.

[8]  C Eng,et al.  Genomic organization and chromosomal localization of the human CUL2 gene and the role of von Hippel‐Lindau tumor suppressor‐binding protein (CUL2 and VBP1) mutation and loss in renal‐cell carcinoma development , 1999, Genes, chromosomes & cancer.

[9]  Nam Jin Yoo,et al.  Mutational analysis of hypoxia‐related genes HIF1α and CUL2 in common human cancers , 2009, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[10]  Arthur Liberzon,et al.  Combining clinical, pathology, and gene expression data to predict recurrence of hepatocellular carcinoma. , 2011, Gastroenterology.

[11]  Jinfeng Zou,et al.  Identification and Construction of Combinatory Cancer Hallmark-Based Gene Signature Sets to Predict Recurrence and Chemotherapy Benefit in Stage II Colorectal Cancer. , 2016, JAMA oncology.

[12]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[14]  Bing Zhang,et al.  Semi-supervised learning improves gene expression-based prediction of cancer recurrence , 2011, Bioinform..

[15]  Maryam Nikkhah,et al.  A novel microRNA located in the TrkC gene regulates the Wnt signaling pathway and is differentially expressed in colorectal cancer specimens , 2017, The Journal of Biological Chemistry.

[16]  J. Goeman L1 Penalized Estimation in the Cox Proportional Hazards Model , 2009, Biometrical journal. Biometrische Zeitschrift.

[17]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[18]  E. Wang,et al.  Dynamic modeling and analysis of cancer cellular network motifs. , 2011, Integrative biology : quantitative biosciences from nano to macro.

[19]  Wessel N van Wieringen,et al.  Losses of chromosome 5q and 14q are associated with favorable clinical outcome of patients with gastric cancer. , 2012, The oncologist.

[20]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[21]  MingJing Shen,et al.  Long noncoding nature brain-derived neurotrophic factor antisense is associated with poor prognosis and functional regulation in non–small cell lung caner , 2017, Tumour biology : the journal of the International Society for Oncodevelopmental Biology and Medicine.

[22]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[23]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[24]  Tu-Bao Ho,et al.  Detecting disease genes based on semi-supervised learning and protein-protein interaction networks , 2012, Artif. Intell. Medicine.

[25]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[26]  K. Leung,et al.  Cancer survival analysis using semi-supervised learning method based on Cox and AFT models with L1/2 regularization , 2016, BMC Medical Genomics.

[27]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[28]  Daphne Koller,et al.  Learning specific-class segmentation from diverse data , 2011, 2011 International Conference on Computer Vision.

[29]  E. Wang,et al.  Predictive genomics: a cancer hallmark network framework for predicting tumor clinical phenotypes using genome sequencing data. , 2014, Seminars in cancer biology.

[30]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[33]  Yang Wang,et al.  RBP2 Induces Epithelial-Mesenchymal Transition in Non-Small Cell Lung Cancer , 2013, PloS one.

[34]  Deyu Meng,et al.  What Objective Does Self-paced Learning Indeed Optimize? , 2015, ArXiv.

[35]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[36]  Jing Zhu,et al.  Paxillin is positively correlated with the clinicopathological factors of colorectal cancer, and knockdown of Paxillin improves sensitivity to cetuximab in colorectal cancer cells. , 2016, Oncology reports.

[37]  Sumio Watanabe,et al.  Clinicopathological features of alpha-fetoprotein producing early gastric cancer with enteroblastic differentiation , 2016, World journal of gastroenterology.

[38]  Thomas Filleron,et al.  Key contribution of eIF4H-mediated translational control in tumor promotion , 2015, Oncotarget.

[39]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[40]  Johan Staaf,et al.  Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns , 2010, Breast Cancer Research.

[41]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .