On Orthogonal Feature Extraction Model with Applications in Medical Prognosis

Abstract Microarray-based gene expression profiling for cancer prognosis can shed light on providing prognostic information and it has statistically become a highly reliable technique. From a computer scientist’s point of view, cancer prognosis can be termed as a classification problem. In microarray-based cancer prognosis problems, selection of critical attributes is extremely popular as a lot of features are irrelevant or redundant. Appropriate feature selection not only can filter irrelevant features but also improve prediction accuracy. Various models and methods have been developed to rank the importance of features. However, there is no dominant strategy because all methods have their own advantages and drawbacks. In this paper, we propose an Orthogonal Feature Extraction (OFE) model based on feature ranking techniques, which aims at improving cancer prediction accuracy. Our proposed model is compared with the most widely used feature extraction method: Principal Component Analysis (PCA), and two other existing feature extraction methods: Neighborhood Component Analysis (NCA) and Linear Discriminant Analysis (LDA) through 5-fold cross-validation. Numerical results indicated that OFE method can efficiently construct combinations of significant variables enabling computational complexity reduction and also provide recognizable better performance.

[1]  P. A. Ramamoorthy,et al.  Principal Component Analysis Based Feature Extraction, Morphological Edge Detection and Localization for Fast Iris Recognition , 2012 .

[2]  Wei-Chiang Hong,et al.  Electric load forecasting by support vector model , 2009 .

[3]  Lu Han,et al.  Orthogonal support vector machine for credit scoring , 2013, Eng. Appl. Artif. Intell..

[4]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[5]  Salvatore J. Stolfo,et al.  Adaptive Intrusion Detection: A Data Mining Approach , 2000, Artificial Intelligence Review.

[6]  James Theiler,et al.  Online feature selection for pixel classification , 2005, ICML.

[7]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[8]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[9]  Xiaoyan Wang,et al.  Regularized orthogonal linear discriminant analysis , 2012, Pattern Recognit..

[10]  Rich Caruana,et al.  Benefitting from the Variables that Variable Selection Discards , 2003, J. Mach. Learn. Res..

[11]  John Daugman,et al.  High Confidence Visual Recognition of Persons by a Test of Statistical Independence , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[14]  Perica Strbac,et al.  Toward optimal feature selection using ranking methods and classification algorithms , 2011 .

[15]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[16]  P. J. García Nieto,et al.  Support vector machines and neural networks used to evaluate paper manufactured using Eucalyptus globulus , 2012 .

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Juyang Weng,et al.  Efficient content-based image retrieval using automatic feature selection , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[19]  Ran El-Yaniv,et al.  Distributional Word Clusters vs. Words for Text Categorization , 2003, J. Mach. Learn. Res..

[20]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[21]  Yoshiki Murakami,et al.  Principal Component Analysis Based Feature Extraction Approach to Identify Circulating microRNA Biomarkers , 2013, PloS one.

[22]  Weida Tong,et al.  DNA Microarrays Are Predictive of Cancer Prognosis: A Re-evaluation , 2010, Clinical Cancer Research.

[23]  R. Spang,et al.  Predicting the clinical status of human breast cancer by using gene expression profiles , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Huan Liu,et al.  An Integrative Approach to Indentifying Biologically Relevant Genes , 2010, SDM.

[25]  Wai-Ki Ching,et al.  Discriminant analysis in pairwise kernel learning for SVM classification , 2012, Int. J. Bioinform. Res. Appl..