A feature selection method using improved regularized linear discriminant analysis

Investigation of genes, using data analysis and computer-based methods, has gained widespread attention in solving human cancer classification problem. DNA microarray gene expression datasets are readily utilized for this purpose. In this paper, we propose a feature selection method using improved regularized linear discriminant analysis technique to select important genes, crucial for human cancer classification problem. The experiment is conducted on several DNA microarray gene expression datasets and promising results are obtained when compared with several other existing feature selection methods.

[1]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[2]  Pong C. Yuen,et al.  Face Recognition by Regularized Discriminant Analysis , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[4]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[5]  Ying Zhang,et al.  A preconditioned conjugate gradient algorithm for GeneRank with application to microarray data mining , 2011, Data Mining and Knowledge Discovery.

[6]  Nassir Navab,et al.  Medical Image Computing and Computer-Assisted Intervention - MICCAI 2010, 13th International Conference, Beijing, China, September 20-24, 2010, Proceedings, Part III , 2010, MICCAI.

[7]  Jason Weston,et al.  Gene functional classification from heterogeneous data , 2001, RECOMB.

[8]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[9]  Kuldip K. Paliwal,et al.  Fast principal component analysis using fixed-point algorithm , 2007, Pattern Recognit. Lett..

[10]  Trevor Hastie,et al.  Regularized Discriminant Analysis and Its Application in Microarrays , 2004 .

[11]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[12]  Dong Xu,et al.  Patch Distribution Compatible Semisupervised Dimension Reduction for Face and Human Gait Recognition , 2012, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[14]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[15]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[16]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[17]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[18]  Jinyan Li,et al.  Using Rules to Analyse Bio-medical Data: A Comparison between C4.5 and PCL , 2003, WAIM.

[19]  Kuldip K. Paliwal,et al.  A new perspective to null linear discriminant analysis method and its fast implementation using random matrix multiplication with scatter matrices , 2012, Pattern Recognit..

[20]  Dong Xu,et al.  Semi-Supervised Bilinear Subspace Learning , 2009, IEEE Transactions on Image Processing.

[21]  Satoru Miyano,et al.  A Top-r Feature Selection Algorithm for Microarray Gene Expression Data , 2012, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[23]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[24]  Anthony K. H. Tung,et al.  Mining top-K covering rule groups for gene expression data , 2005, SIGMOD '05.

[25]  Hanqing Lu,et al.  Solving the small sample size problem of LDA , 2002, Object recognition supported by user interaction for service robots.

[26]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Antai Wang,et al.  Gene selection for microarray data analysis using principal component analysis , 2005, Statistics in medicine.

[28]  J. Friedman Regularized Discriminant Analysis , 1989 .

[29]  Vladimir Pavlovic,et al.  RankGene: identification of diagnostic genes based on expression data , 2003, Bioinform..

[30]  Satoru Miyano,et al.  Null space based feature selection method for gene expression data , 2012, Int. J. Mach. Learn. Cybern..

[31]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[32]  Kuldip K. Paliwal,et al.  Rotational Linear Discriminant Analysis Technique for Dimensionality Reduction , 2008, IEEE Transactions on Knowledge and Data Engineering.

[33]  Nick Barnes,et al.  Hippocampal Shape Classification Using Redundancy Constrained Feature Selection , 2010, MICCAI.

[34]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[35]  Sushmita Mitra,et al.  Evolutionary Rough Feature Selection in Gene Expression Data , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[36]  Satoru Miyano,et al.  A between-Class Overlapping Filter-Based Method for transcriptome Data Analysis , 2012, J. Bioinform. Comput. Biol..

[37]  Kuldip K. Paliwal,et al.  A Gradient Linear Discriminant Analysis for Small Sample Sized Problem , 2008, Neural Processing Letters.

[38]  Satoru Miyano,et al.  Strategy of finding optimal number of features on gene expression data , 2011 .

[39]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[40]  Pong C. Yuen,et al.  Regularized discriminant analysis and its application to face recognition , 2003, Pattern Recognit..

[41]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[42]  Dong Xu,et al.  Semi-Supervised Dimension Reduction Using Trace Ratio Criterion , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[43]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[44]  Kuldip K. Paliwal,et al.  A feature selection method using fixed-point algorithm for DNA microarray gene expression data , 2014, Int. J. Knowl. Based Intell. Eng. Syst..

[45]  Daoqiang Zhang,et al.  Efficient Pseudoinverse Linear Discriminant Analysis and its Nonlinear Form for Face Recognition , 2007, Int. J. Pattern Recognit. Artif. Intell..

[46]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[47]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[48]  J. Thomas,et al.  An efficient and robust statistical modeling approach to discover differentially expressed genes using genomic expression profiles. , 2001, Genome research.

[49]  Cheng-Shang Chang Calculus , 2020, Bicycle or Unicycle?.

[50]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .