Integration of gene signatures using biological knowledge

OBJECTIVE Gene expression patterns that distinguish clinically significant disease subclasses may not only play a prominent role in diagnosis, but also lead to the therapeutic strategies tailoring the treatment to the particular biology of each disease. Nevertheless, gene expression signatures derived through statistical feature-extraction procedures on population datasets have received rightful criticism, since they share few genes in common, even when derived from the same dataset. We focus on knowledge complementarities conveyed by two or more gene-expression signatures by means of embedded biological processes and pathways, which alternatively form a meta-knowledge platform of analysis towards a more global, robust and powerful solution. METHODS The main contribution of this work is the introduction and study of an approach for integrating different gene signatures based on the underlying biological knowledge, in an attempt to derive a unified global solution. It is further recognized that one group's signature does not perform well on another group's data, due to incompatibilities of microarray technologies and the experimental design. We assess this cross-platform aspect, showing that a unified solution derived on the basis of both statistical and biological validation may also help in overcoming such inconsistencies. RESULTS Based on the proposed approach we derived a unified 69-gene signature, which outperforms significantly the performance of the initial signatures succeeding a 0.73 accuracy metric on 234 new patients with 81% sensitivity and 64% specificity. The same signature manages to reveal the two prognostic groups on an additional dataset of 286 new patients obtained through a different experimental protocol and microarray platform. Furthermore, it manages to derive two clusters in a dataset from a different platform, showing remarkable difference on both gene-expression and survival-prediction levels.

[1]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[2]  Feng Jiang,et al.  Gene and pathway identification with Lp penalized Bayesian logistic regression , 2008, BMC Bioinformatics.

[3]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[4]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[5]  A. Vazquez,et al.  Integrating biological information into the statistical analysis and design of microarray experiments. , 2010, Animal : an international journal of animal bioscience.

[6]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[7]  G. Parmigiani,et al.  Core Signaling Pathways in Human Pancreatic Cancers Revealed by Global Genomic Analyses , 2008, Science.

[8]  R. Collins,et al.  Polychemotherapy for early breast cancer: an overview of the randomised trials , 1998, The Lancet.

[9]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[10]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[11]  Rabiya S Tuma Multiple gene signatures aim to qualify risk in breast cancer. , 2005, Journal of the National Cancer Institute.

[12]  Eytan Domany,et al.  Outcome signature genes in breast cancer: is there a unique set? , 2004, Breast Cancer Research.

[13]  M. Radmacher,et al.  Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. , 2003, Journal of the National Cancer Institute.

[14]  Yi Zhang,et al.  Pathway analysis of gene signatures predicting metastasis of node-negative primary breast cancer , 2007, BMC Cancer.

[15]  Stefan Michiels,et al.  Prediction of cancer outcome with microarrays: a multiple random validation strategy , 2005, The Lancet.

[16]  D. Slonim From patterns to pathways: gene expression data analysis comes of age , 2002, Nature Genetics.

[17]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[18]  Perry L. Miller,et al.  Journal of Biomedical Informatics 40 (2007) 750–760 , 2006 .

[19]  Stephen Cordner,et al.  melbourne Aboriginal people trade land claim for dialysis , 1998, The Lancet.

[20]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[21]  Satoru Kuhara,et al.  Recursive gene selection based on maximum margin criterion: a comparison with SVM-RFE , 2006, BMC Bioinformatics.

[22]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[23]  R Simon,et al.  Diagnostic and prognostic prediction using gene expression profiles in high-dimensional microarray data , 2003, British Journal of Cancer.

[24]  Michalis E. Blazadonakis,et al.  The linear neuron as marker selector and clinical predictor in cancer gene analysis , 2008, Comput. Methods Programs Biomed..

[25]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.