Integration of molecular features with clinical information for predicting outcomes for neuroblastoma patients

BackgroundNeuroblastoma is one of the most common types of pediatric cancer. In current neuroblastoma prognosis, patients can be stratified into high- and low-risk groups. Generally, more than 90% of the patients in the low-risk group will survive, while less than 50% for those with the high-risk disease will survive. Since the so-called “high-risk” patients still contain patients with mixed good and poor outcomes, more refined stratification needs to be established so that for the patients with poor outcome, they can receive prompt and individualized treatment to improve their long-term survival rate, while the patients with good outcome can avoid unnecessary over treatment.MethodsWe first mined co-expressed gene modules from microarray and RNA-seq data of neuroblastoma samples using the weighted network mining algorithm lmQCM, and summarize the resulted modules into eigengenes. Then patient similarity weight matrix was constructed with module eigengenes using two different approaches. At the last step, a consensus clustering method called Molecular Regularized Consensus Patient Stratification (MRCPS) was applied to aggregate both clinical information (clinical stage and clinical risk level) and multiple eigengene data for refined patient stratification.ResultsThe integrative method MRCPS demonstrated superior performance to clinical staging or transcriptomic features alone for the NB cohort stratification. It successfully identified the worst prognosis group from the clinical high-risk group, with less than 40% survived in the first 50 months of diagnosis. It also identified highly differentially expressed genes between best prognosis group and worst prognosis group, which can be potential gene biomarkers for clinical testing.ConclusionsTo address the need for better prognosis and facilitate personalized treatment on neuroblastoma, we modified the recently developed bioinformatics workflow MRCPS for refined patient prognosis. It integrates clinical information and molecular features such as gene co-expression for prognosis. This clustering workflow is flexible, allowing the integration of both categorical and numerical data. The results demonstrate the power of survival prognosis with this integrative analysis workflow, with superior prognostic performance to only using transcriptomic data or clinical staging/risk information alone.ReviewersThis article was reviewed by Lan Hu, Haibo Liu, Julie Zhu and Aleksandra Gruca.

[1]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[2]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[3]  R. Versteeg,et al.  Targeted BIRC5 silencing using YM155 causes cell death in neuroblastoma cells with low ABCB1 expression. , 2012, European journal of cancer.

[4]  Arindam Banerjee,et al.  Bayesian cluster ensembles , 2011, Stat. Anal. Data Min..

[5]  S. Jason,et al.  Neuroblastoma , 2018 .

[6]  Raghu Machiraju,et al.  Breast cancer patient stratification using a molecular regularized consensus clustering method. , 2014, Methods.

[7]  Yang Xiang,et al.  Weighted Frequent Gene Co-expression Network Mining to Identify Genes Involved in Genome Stability , 2012, PLoS Comput. Biol..

[8]  F. Berthold,et al.  High genomic instability predicts survival in metastatic high-risk neuroblastoma. , 2012, Neoplasia.

[9]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[10]  Jinpu Yu,et al.  Research progress of neuroblastoma related gene variations , 2016, Oncotarget.

[11]  N. Aygun Biological and Genetic Features of Neuroblastoma and Their Clinical Importance. , 2018, Current pediatric reviews.

[12]  Jun Kong,et al.  Integrated morphologic analysis for the identification and characterization of disease subtypes , 2012, J. Am. Medical Informatics Assoc..

[13]  Martin Ester,et al.  Density‐based clustering , 2019, WIREs Data Mining Knowl. Discov..

[14]  Travis S. Johnson,et al.  Functional Virtual Flow Cytometry: A Visual Analytic Approach for Characterizing Single-Cell Gene Expression Patterns , 2017, BioMed research international.

[15]  Brad T. Sherman,et al.  Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists , 2008, Nucleic acids research.

[16]  Kun Huang,et al.  Integrative analysis based on survival associated co-expression gene modules for predicting Neuroblastoma patients’ survival time , 2019, Biology Direct.

[17]  Zoubin Ghahramani,et al.  Discovering transcriptional modules by Bayesian data integration , 2010, Bioinform..

[18]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[19]  Hans-Peter Kriegel,et al.  Density‐based clustering , 2011, WIREs Data Mining Knowl. Discov..

[20]  Rui Wu,et al.  Clinical and Pathological Variation of Charcot-Marie-Tooth 1A in a Large Chinese Cohort , 2017, BioMed research international.

[21]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..