A Hybrid Approach of Gene Sets and Single Genes for the Prediction of Survival Risks with Gene Expression Data

Accumulated biological knowledge is often encoded as gene sets, collections of genes associated with similar biological functions or pathways. The use of gene sets in the analyses of high-throughput gene expression data has been intensively studied and applied in clinical research. However, the main interest remains in finding modules of biological knowledge, or corresponding gene sets, significantly associated with disease conditions. Risk prediction from censored survival times using gene sets hasn’t been well studied. In this work, we propose a hybrid method that uses both single gene and gene set information together to predict patient survival risks from gene expression profiles. In the proposed method, gene sets provide context-level information that is poorly reflected by single genes. Complementarily, single genes help to supplement incomplete information of gene sets due to our imperfect biomedical knowledge. Through the tests over multiple data sets of cancer and trauma injury, the proposed method showed robust and improved performance compared with the conventional approaches with only single genes or gene sets solely. Additionally, we examined the prediction result in the trauma injury data, and showed that the modules of biological knowledge used in the prediction by the proposed method were highly interpretable in biology. A wide range of survival prediction problems in clinical genomics is expected to benefit from the use of biological knowledge.

[1]  Filip Zelezný,et al.  Comparative evaluation of set-level techniques in predictive classification of gene expression samples , 2012, BMC Bioinformatics.

[2]  H. Dressman,et al.  Retraction: Acharya CR, et al. Gene expression signatures, clinicopathological features, and individualized therapy in breast cancer. JAMA. 2008;299(13):1574-1587. , 2012, JAMA.

[3]  John D. Storey,et al.  A genomic storm in critically injured humans , 2011, The Journal of experimental medicine.

[4]  Justin Zobel,et al.  Prediction of breast cancer prognosis using gene set statistics provides signature stability and biological context , 2010, BMC Bioinformatics.

[5]  Junhee Seok,et al.  Knowledge-based analysis of microarrays for the discovery of transcriptional regulation relationships , 2010, BMC Bioinformatics.

[6]  Junhee Seok,et al.  A dynamic network of transcription in LPS-treated human subjects , 2009, BMC Systems Biology.

[7]  L. Staudt,et al.  Stromal gene signatures in large-B-cell lymphomas. , 2008, The New England journal of medicine.

[8]  H. Kölbl,et al.  The humoral immune system has a key prognostic impact in node-negative breast cancer. , 2008, Cancer research.

[9]  H. Dressman,et al.  Gene expression signatures, clinicopathological features, and individualized therapy in breast cancer. , 2008, JAMA.

[10]  S. Dairkee,et al.  Bisphenol A induces a profile of tumor aggressiveness in high-risk cells from breast cancer patients. , 2008, Cancer research.

[11]  Arnoldo Frigessi,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm305 Gene expression Predicting survival from microarray data—a comparative study , 2022 .

[12]  Anthony Boral,et al.  Gene expression profiling and correlation with outcome in clinical trials of the proteasome inhibitor bortezomib. , 2006, Blood.

[13]  David R Williams,et al.  Gene-expression signature of benign monoclonal gammopathy evident in multiple myeloma is linked to good prognosis. , 2006, Blood.

[14]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[15]  R. Tibshirani,et al.  On testing the significance of sets of genes , 2006, math/0610667.

[16]  John Crowley,et al.  The molecular classification of multiple myeloma. , 2006, Blood.

[17]  R. Spang,et al.  A biologic definition of Burkitt's lymphoma from transcriptional and genomic profiling. , 2006, The New England journal of medicine.

[18]  M. Segal Microarray gene expression data with linked survival phenotypes: diffuse large-B-cell lymphoma revisited. , 2006, Biostatistics.

[19]  John D. Storey,et al.  A network-based analysis of systemic inflammation in humans , 2005, Nature.

[20]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[21]  T. Gilliam,et al.  Molecular triangulation: bridging linkage and molecular-network information for identifying candidate genes in Alzheimer's disease. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[22]  R. Tibshirani,et al.  Semi-Supervised Methods to Predict Patient Survival from Gene Expression Data , 2004, PLoS biology.

[23]  Sergei Egorov,et al.  Pathway studio - the analysis and navigation of molecular networks , 2003, Bioinform..

[24]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[25]  Lu Tian,et al.  Linking gene expression data with patient survival times using partial least squares , 2002, ISMB.

[26]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[27]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[29]  N. Sampas,et al.  Molecular classification of cutaneous malignant melanoma by gene expression profiling , 2000, Nature.

[30]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[31]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[32]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[33]  R. W. Davis,et al.  Discovery and analysis of inflammatory disease-related genes using cDNA microarrays. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[34]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[35]  J. Peto,et al.  Asymptotically Efficient Rank Invariant Test Procedures , 1972 .

[36]  D.,et al.  Regression Models and Life-Tables , 2022 .