Sequential Application of Feature Selection and Extraction for Predicting Breast Cancer Aggressiveness

Breast cancer is a heterogenous disease with a large variance in prognosis of patients. It is hard to identify patients who would need adjuvant chemotherapy to survive. Using microarray based technology and various feature selection techniques, a number of prognostic gene expression signatures have been proposed recently. It has been shown that these signatures outperform traditional clinical guidelines for estimating prognosis. This paper studies the applicability of state-of-the-art feature extraction methods together with feature selection methods to develop more powerful prognosis estimators. Feature selection is used to remove features not related with the clinical issue investigated. If the resulted dataset is still described by a high number of probes, feature extraction methods can be applied to further reduce the dimension of the data set. In addition we derived six new signatures using three independent data sets, containing in total 610 samples.

[1]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[2]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[3]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[4]  Richard Baumgartner,et al.  Class prediction and discovery using gene microarray and proteomics mass spectroscopy data: curses, caveats, cautions , 2003, Bioinform..

[5]  I. Ginsberg,et al.  Unsupervised hyperspectral image analysis using independent component analysis , 2000, IGARSS 2000. IEEE 2000 International Geoscience and Remote Sensing Symposium. Taking the Pulse of the Planet: The Role of Remote Sensing in Managing the Environment. Proceedings (Cat. No.00CH37120).

[6]  L. Holmberg,et al.  Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts , 2005, Breast Cancer Research.

[7]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Crispin J. Miller,et al.  The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis , 2008, BMC Medical Genomics.

[9]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[10]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  E Shelley Hwang,et al.  Identification of a robust gene signature that predicts breast cancer outcome in independent data sets , 2007, BMC Cancer.

[12]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[13]  Gianluca Bontempi,et al.  Comparison of prognostic gene expression signatures for breast cancer , 2008, BMC Genomics.

[14]  Jeffrey T. Chang,et al.  Oncogenic pathway signatures in human cancers as a guide to targeted therapies , 2006, Nature.

[15]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[16]  H. Gunshin,et al.  A review of independent component analysis application to microarray gene expression data. , 2008, BioTechniques.

[17]  Pramod K. Varshney,et al.  Target detection in hyperspectral images based on independent component analysis , 2002, SPIE Defense + Commercial Sensing.

[18]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[19]  H. Kölbl,et al.  The humoral immune system has a key prognostic impact in node-negative breast cancer. , 2008, Cancer research.

[20]  Jan Luts,et al.  Effect of feature extraction for brain tumor classification based on short echo time 1H MR spectra , 2008, Magnetic resonance in medicine.

[21]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.