Pathway-based identification of a smoking associated 6-gene signature predictive of lung cancer risk and survival

OBJECTIVE Smoking is a prominent risk factor for lung cancer. However, it is not an established prognostic factor for lung cancer in clinics. To date, no gene test is available for diagnostic screening of lung cancer risk or prognostication of clinical outcome in smokers. This study sought to identify a smoking associated gene signature in order to provide a more precise diagnosis and prognosis of lung cancer in smokers. METHODS AND MATERIALS An implication network based methodology was used to identify biomarkers by modeling crosstalk with major lung cancer signaling pathways. Specifically, the methodology contains the following steps: (1) identifying genes significantly associated with lung cancer survival; (2) selecting candidate genes which are differentially expressed in smokers versus non-smokers from the survival genes identified in Step 1; (3) from these candidate genes, constructing gene coexpression networks based on prediction logic for the smoker group and the non-smoker group, respectively; (4) identifying smoking-mediated differential components, i.e., the unique gene coexpression patterns specific to each group; and (5) from the differential components, identifying genes directly co-expressed with major lung cancer signaling hallmarks. RESULTS A smoking-associated 6-gene signature was identified for prognosis of lung cancer from a training cohort (n=256). The 6-gene signature could separate lung cancer patients into two risk groups with distinct post-operative survival (log-rank P<0.04, Kaplan-Meier analyses) in three independent cohorts (n=427). The expression-defined prognostic prediction is strongly related to smoking association and smoking cessation (P<0.02; Pearson's Chi-squared tests). The 6-gene signature is an accurate prognostic factor (hazard ratio=1.89, 95% CI: [1.04, 3.43]) compared to common clinical covariates in multivariate Cox analysis. The 6-gene signature also provides an accurate diagnosis of lung cancer with an overall accuracy of 73% in a cohort of smokers (n=164). The coexpression patterns derived from the implication networks were validated with interactions reported in the literature retrieved with STRING8, Ingenuity Pathway Analysis, and Pathway Studio. CONCLUSIONS The pathway-based approach identified a smoking-associated 6-gene signature that predicts lung cancer risk and survival. This gene signature has potential clinical implications in the diagnosis and prognosis of lung cancer in smokers.

[1]  M. Gerstein,et al.  A Bayesian Networks Approach for Predicting Protein-Protein Interactions from Genomic Data , 2003, Science.

[2]  Sangsoo Kim,et al.  Gene expression Differential coexpression analysis using microarray data and its application to human cancer , 2005 .

[3]  Matej Oresic,et al.  Systematic construction of gene coexpression networks with applications to human T helper cell differentiation process , 2007, Bioinform..

[4]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[5]  Jeremy J. W. Chen,et al.  Topology-based cancer classification and related pathway mining using microarray data , 2006, Nucleic acids research.

[6]  David E. Misek,et al.  Gene-expression profiles predict survival of patients with lung adenocarcinoma , 2002, Nature Medicine.

[7]  Kathryn A. Szabat Prediction Analysis of Cross‐Classifications , 2005 .

[8]  Yi Zhang,et al.  Gene expression signatures for predicting prognosis of squamous cell and adenocarcinomas of the lung. , 2006, Cancer research.

[9]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.

[10]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[11]  Ian Witten,et al.  Data Mining , 2000 .

[12]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[13]  K. Basso,et al.  A systems biology approach to prediction of oncogenes and molecular perturbation targets in B-cell lymphomas , 2008, Molecular systems biology.

[14]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[15]  N. Guo,et al.  Impact and interactions between smoking and traditional prognostic factors in lung cancer progression. , 2009, Lung cancer.

[16]  H. Stefánsson,et al.  Genetics of gene expression and its effect on disease , 2008, Nature.

[17]  Jiming Liu,et al.  A New Uncertainty Measure for Belief Networks with Applications to Optimal Evidential Inferencing , 2001, IEEE Trans. Knowl. Data Eng..

[18]  A. Barabasi,et al.  Drug—target network , 2007, Nature Biotechnology.

[19]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[20]  A. Hartmann,et al.  (www.interscience.wiley.com) DOI: 10.1002/path.2039 , 2006 .

[21]  Robert Tibshirani,et al.  Boolean implication networks derived from large scale, whole genome microarray datasets , 2008, Genome Biology.

[22]  M. Spitz,et al.  Smoking-related genomic signatures in non-small cell lung cancer. , 2008, American journal of respiratory and critical care medicine.

[23]  R. Shamir,et al.  Regulatory networks define phenotypic classes of human stem cell lines , 2008, Nature.

[24]  Srinivasan Parthasarathy,et al.  Construction of a reference gene association network from multiple profiling data: application to data analysis , 2007, Bioinform..

[25]  Debashis Sahoo,et al.  MiDReG: A method of mining developmentally regulated genes using Boolean implications , 2010, Proceedings of the National Academy of Sciences.

[26]  Péter Csermely,et al.  The efficiency of multi-target drugs: the network approach might help drug design. , 2004, Trends in pharmacological sciences.

[27]  John D. Storey,et al.  A network-based analysis of systemic inflammation in humans , 2005, Nature.

[28]  Simo V. Zhang,et al.  A map of human cancer signaling , 2007, Molecular systems biology.

[29]  James Denvir,et al.  A novel network model for molecular prognosis , 2010, BCB '10.

[30]  Jiming Liu,et al.  A Method of Learning Implication Networks from Empirical Data: Algorithm and Monte-Carlo Simulation-Based Validation , 1997, IEEE Trans. Knowl. Data Eng..

[31]  James D. Laing,et al.  Prediction Analysis of Cross Classifications. , 1976 .

[32]  David Warde-Farley,et al.  Dynamic modularity in protein interaction networks predicts breast cancer outcome , 2009, Nature Biotechnology.

[33]  Jeremy J. W. Chen,et al.  A five-gene signature and clinical outcome in non-small-cell lung cancer. , 2007, The New England journal of medicine.

[34]  Bojan Cukic,et al.  Predicting fault prone modules by the Dempster-Shafer belief networks , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[35]  Michael E. Andrew,et al.  A novel network model identified a 13-gene lung cancer prognostic signature , 2011, Int. J. Comput. Biol. Drug Des..

[36]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[37]  Cheng Li,et al.  Automating dChip: toward reproducible sharing of microarray data analysis , 2008, BMC Bioinformatics.

[38]  A. Jemal,et al.  Cancer Statistics, 2009 , 2009, CA: a cancer journal for clinicians.

[40]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[41]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.