Integrative analysis based on survival associated co-expression gene modules for predicting Neuroblastoma patients’ survival time

BackgroundMore than 90% of neuroblastoma patients are cured in the low-risk group while only less than 50% for those with high-risk disease can be cured. Since the high-risk patients still have poor outcomes, we need more accurate stratification to establish an individualized precise treatment plan for the patients to improve the long-term survival rate.ResultsWe focus on extracting features and providing a workflow to improve survival prediction for neuroblastoma patients. With a workflow for gene co-expression network (GCN) mining in microarray and RNA-Seq datasets, we extracted molecular features from each co-expressed module and summarized them into eigengenes. Then we adopted the lasso-regularized Cox proportional hazards model to select the most informative eigengene features regarding association to the risk of metastasis. Nine eigengenes were selected which show strong association with patient survival prognosis. All of the nine corresponding gene modules also have highly enriched biological functions or cytoband locations. Three of them are unique modules to RNA-Seq data, which complement the modules from microarray data in terms of survival prognosis. We then merged all eigengenes from these unique modules and used an integrative method called Similarity Network Fusion to test the prognostic power of these eigengenes for prognosis. The prognostic accuracies are significantly improved as compared to using all eigengenes, and a subgroup of patients with very poor survival rate was identified.ConclusionsWe first compared GCNs mined from microarray and RNA-seq data. We discovered that each data modality yields unique GCNs, which are enriched with clear biological functions. Then we do module unique analysis and use lasso-cox model to select survival-associated eigengenes. Integration of unique and survival-associated eigengenes from both data types provides complementary information that leads to more accurate survival prognosis.ReviewersReviewed by Susmita Datta, Marco Chierici and Dimitar Vassilev.

[1]  Yang Xiang,et al.  Using Frequent Co-expression Network to Identify Gene Clusters for Breast Cancer Prognosis , 2009, 2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing.

[2]  K. Matthay,et al.  Long-term results for children with high-risk neuroblastoma treated on a randomized trial of myeloablative therapy followed by 13-cis-retinoic acid: a children's oncology group study. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[3]  Shumpei Niida,et al.  Estrogen Regulates the Production of VEGF for Osteoclast Formation and Activity in op/op Mice , 2003, Journal of bone and mineral research : the official journal of the American Society for Bone and Mineral Research.

[4]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[5]  Luciano Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[6]  H. Aburatani,et al.  Gene expression profiling and identification of novel prognostic marker genes in neuroblastoma , 2004, Genes, chromosomes & cancer.

[7]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[8]  Jing Chen,et al.  ToppGene Suite for gene list enrichment analysis and candidate gene prioritization , 2009, Nucleic Acids Res..

[9]  Travis S. Johnson,et al.  Functional Virtual Flow Cytometry: A Visual Analytic Approach for Characterizing Single-Cell Gene Expression Patterns , 2017, BioMed research international.

[10]  Jie Zhang,et al.  A matrix rank based concordance index for evaluating and detecting conditional specific co-expressed gene modules , 2016, BMC Genomics.

[11]  D. Machin,et al.  High-dose rapid and standard induction chemotherapy for patients aged over 1 year with stage 4 neuroblastoma: a randomised trial. , 2008, The Lancet. Oncology.

[12]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[13]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[14]  Kun Huang,et al.  Normalized lmQCM: An Algorithm for Detecting Weak Quasi-Cliques in Weighted Graph with Applications in Gene Co-Expression Module Discovery in Cancers , 2014, Cancer informatics.

[15]  F. Berthold,et al.  High genomic instability predicts survival in metastatic high-risk neuroblastoma. , 2012, Neoplasia.

[16]  Yang Xiang,et al.  Weighted Frequent Gene Co-expression Network Mining to Identify Genes Involved in Genome Stability , 2012, PLoS Comput. Biol..

[17]  Peter Langfelder,et al.  Fast R Functions for Robust Correlations and Hierarchical Clustering. , 2012, Journal of statistical software.

[18]  May D. Wang,et al.  Comparison of RNA-seq and microarray-based models for clinical endpoint prediction , 2015, Genome Biology.

[19]  Lana X. Garmire,et al.  More Is Better: Recent Progress in Multi-Omics Data Integration Methods , 2017, Front. Genet..

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  E. Hiyama,et al.  Expression profiling of favorable and unfavorable neuroblastomas , 2003, Pediatric Surgery International.

[22]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[23]  W. Alexander,et al.  The American society for bone and mineral research , 1987, Steroids.