Microarray data classification using Fuzzy K-Nearest Neighbor

Microarray dataset may contain a huge number of insignificant and irrelevant features that might lead to loss of useful information. The classes with both high relevance and having high significance feature sets are generally preferred for selecting the features, which determines the sample classification into their respective classes. This property has gained a lot of significance among the researchers and practitioners in DNA microarray classification. In this paper, K-Nearest Neighbor (K-NN) and Fuzzy K-Nearest Neighbor (Fuzzy K-NN) algorithms are used to classify microarray data sets using t-test as a feature selection method. Further, this paper presents a comparative analysis on the obtained classification accuracy by coupling Fuzzy K-NN along with K-NN and other existing models available in the literature. Performance parameters available in literature such as: precision, recall, specificity, F-Measure, ROC curve and accuracy are used in this comparative analysis to analyze the behavior of the classifiers. From the proposed approach, it is apparent that Fuzzy K-NN model is the most suitable classification model among K-NN and other classifiers.

[1]  Yufei Huang,et al.  Gene Regulation, Modulation, and Their Applications in Gene Expression Data Analysis , 2013, Adv. Bioinformatics.

[2]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[3]  Yungho Leu,et al.  A novel hybrid feature selection method for microarray data analysis , 2011, Appl. Soft Comput..

[4]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[5]  A. Madabhushi,et al.  Investigating the Efficacy of Nonlinear Dimensionality Reduction Schemes in Classifying Gene and Protein Expression Studies , 2008, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  James M. Keller,et al.  Incorporating Fuzzy Membership Functions into the Perceptron Algorithm , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ahmed Salem Dina,et al.  MGS-CM: A Multiple Scoring Gene Selection Technique for Cancer Classification using Microarrays , 2011 .

[8]  Hong-Wen Deng,et al.  Gene selection for classification of microarray data based on the Bayes error , 2007, BMC Bioinformatics.

[9]  Ying Liu,et al.  A Hybrid Approach for Biomarker Discovery from Microarray Gene Expression Data for Cancer Classification , 2007, Cancer informatics.

[10]  Jieping Ye,et al.  Using uncorrelated discriminant analysis for tissue classification with gene expression data , 2004, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[12]  D. Cavalieri,et al.  Fundamentals of cDNA microarray data analysis. , 2003, Trends in genetics : TIG.

[13]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[14]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[15]  Wei Xie,et al.  Accurate Cancer Classification Using Expressions of Very Few Genes , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.