Reducing a Biomarkers List via Mathematical Programming: Application to Gene Signatures to Detect Time-Dependent Hypoxia in Cancer

In biology and medical sciences, highly parallel biological assays spurred a revolution leading to the emergence of the '-omics' era. Dimensionality reduction techniques are necessary to be able to analyze, interpret, validate and take advantage of the tremendous wealth of highly dimensional data they provide. This paper is based on a DNA microarray study providing gene signatures for hypoxia. These gene signatures were tested on a large breast cancer data set for assessing their prognostic power by means of Kaplan-Meier survival, univariate, and multivariate analyses. We explore the use of several mathematical programming-based techniques that aim to reduce the gene signature sizes as much as possible while maintaining the key characteristics of the original signature, more precisely: the signature prognostic and diagnostic significance. The proposed signature reduction techniques have very interesting potential uses. Indeed, by downsizing the relevant data to a manageable size, one can then patent the core set of biomarkers and also create a dedicated assay (e.g.: on a customized array) for routine applications (e.g.: in the clinical set up) leading to individualized medicine capabilities. Our experiments show that the reduced hypoxia signatures reproduced qualitatively and quantitatively in a similar way that of the original ones.

[1]  Trevor Hastie,et al.  Gene Expression Programs in Response to Hypoxia: Cell Type Specificity and Prognostic Significance in Human Cancers , 2006, PLoS medicine.

[2]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[3]  A. Levine,et al.  Surfing the p53 network , 2000, Nature.

[4]  Glenn Fung,et al.  Learning sparse metrics via linear programming , 2006, KDD '06.

[5]  Bala Srinivasan,et al.  Dynamic self-organizing maps with controlled growth for knowledge discovery , 2000, IEEE Trans. Neural Networks Learn. Syst..

[6]  Kevin L. Gunderson,et al.  Highly parallel genomic assays , 2006, Nature Reviews Genetics.

[7]  Howard Y. Chang,et al.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[8]  J. Nevins,et al.  Linking oncogenic pathways with therapeutic opportunities , 2006, Nature Reviews Cancer.

[9]  Bernd Fritzke Growing Grid — a self-organizing network with constant neighborhood range and adaptation strength , 1995, Neural Processing Letters.

[10]  Kate Smith-Miles,et al.  HDGSOM: a modified growing self-organizing map for high dimensional data clustering , 2004, Fourth International Conference on Hybrid Intelligent Systems (HIS'04).

[11]  Andreas Rauber,et al.  The growing hierarchical self-organizing map , 2000, Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural Computing: New Challenges and Perspectives for the New Millennium.

[12]  Zhilin Qu,et al.  Signal transduction network motifs and biological memory. , 2007, Journal of theoretical biology.

[13]  Glenn Fung,et al.  Impact of supervised gene signatures of early hypoxia on patient survival. , 2007, Radiotherapy and oncology : journal of the European Society for Therapeutic Radiology and Oncology.

[14]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[15]  M. West,et al.  Embracing the complexity of genomic data for personalized medicine. , 2006, Genome research.

[16]  G. Semenza Targeting HIF-1 for cancer therapy , 2003, Nature Reviews Cancer.

[17]  Risto Mukkulainen,et al.  Script Recognition with Hierarchical Feature Maps , 1990 .

[18]  Philippe Lambin,et al.  Targeting hypoxia tolerance in cancer. , 2004, Drug resistance updates : reviews and commentaries in antimicrobial and anticancer chemotherapy.

[19]  Hujun Yin,et al.  Adaptive topological tree structure for document organisation and visualisation , 2004, Neural Networks.

[20]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[21]  Kimmo Kiviluoto,et al.  Topology preservation in self-organizing maps , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[22]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[23]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[24]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[25]  John Quackenbush Microarray analysis and tumor classification. , 2006, The New England journal of medicine.

[26]  B. Palsson The challenges of in silico biology , 2000, Nature Biotechnology.

[27]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Bernd Fritzke,et al.  Growing cell structures--A self-organizing network for unsupervised and supervised learning , 1994, Neural Networks.

[29]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[30]  Michael E Phelps,et al.  Systems Biology and New Technologies Enable Predictive and Preventative Medicine , 2004, Science.