Principal component-based feature selection for tumor classification.

One of the important problems in microarray gene expression data is tumor classification. This paper proposes a new feature selection method for tumor classification using gene expression data. In this method, three dimensionality reduction methods, including principal component analysis (PCA), factor analysis (FA) and independent component analysis (ICA), are first introduced to extract and select features for tumor classification, and their corresponding specific steps are given respectively. Then, the superiority of three algorithms is demonstrated by performing experimental comparisons on acute leukemia data sets. It is concluded that PCA compared with FA and ICA is the best under feature load ratio. However, PCA cannot make full use of the category information. To overcome the weak point, Fisher linear discriminant (FLD) is employed as those components of PCA, and a new approach to principal component discriminant analysis (PCDA) is proposed to retain all assets and work better than both PCA and FLD for classification. The further experimental results show that the classification ability of selected feature subsets by means of PCDA is higher than that of the other related dimensionality reduction methods, and the proposed algorithm is efficient and feasible for tumor classification.

[1]  Huowang Chen,et al.  Molecular Diagnosis of Tumor Based on Independent Component Analysis and Support Vector Machines , 2006, 2006 International Conference on Computational Intelligence and Security.

[2]  Li Shang,et al.  Feature selection in independent component subspace for microarray data classification , 2006, Neurocomputing.

[3]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[4]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[5]  Jie Gui,et al.  Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction , 2010, Comput. Biol. Medicine.

[6]  Huowang Chen,et al.  The Classification of Tumor Using Gene Expression Profile Based on Support Vector Machines and Factor Analysis , 2006, Sixth International Conference on Intelligent Systems Design and Applications.

[7]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[8]  Eduardo Mendez,et al.  A Log Likelihood Predictor for Genomic Classification of Oral Cancer using Principle Component Analysis for Feature Selection , 2004, MedInfo.

[9]  E. Fukusaki,et al.  Metabolic distance estimation based on principle component analysis of metabolic turnover. , 2014, Journal of bioscience and bioengineering.

[10]  Carlo Di Bello,et al.  PCA disjoint models for multiclass cancer analysis using gene expression data , 2003, Bioinform..

[11]  Xiaoli Li,et al.  Multi-resolution independent component analysis for high-performance tumor classification and biomarker discovery , 2011, BMC Bioinformatics.

[12]  Jin Cao,et al.  A fast gene selection method for multi-cancer classification using multiple support vector data description , 2015, J. Biomed. Informatics.

[13]  Lin Sun,et al.  An ensemble feature selection technique for cancer recognition. , 2014, Bio-medical materials and engineering.

[14]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.