A signal-to-noise classification model for identification of differentially expressed genes from gene expression data

A major focus in cancer research is identifying genetic markers or biomarkers. To build a robust classifier we have to find out the differentially expressed genes (key genes) in binary classification. The differentially expressed genes or biomarker gene selection is the preprocessing task for cancer classification. In this paper, we have compared the results of two approaches for selecting biomarkers from Leukemia data set. The first approach for feature selection is by implementing k-means clustering and signal-to-noise ratio (SNR) method for gene ranking, the top scored genes from each cluster is selected and given to the classifiers. The second approach uses signal to noise ratio ranking only for feature selection. For validation of both the approaches, we have used k nearest neighbor (kNN), support vector machine (SVM), probabilistic Neural Network (PNN) and Feed Forward Neural Network (fNN). After comparing the final results of two approaches we have got 100%, 96%and 96% accuracy with SVM, kNN and PNN respectively in first approach with five numbers of genes. Whereas, performance of FNN is 2.17 with 10 numbers of genes. In second approach we have got 96%, 96% and 62% accuracies for SVM, kNN and PNN respectively for 5 numbers of genes and the performance of FNN is 2.52 for 10 genes.

[1]  P. Broberg Statistical methods for ranking differentially expressed genes , 2003, Genome Biology.

[2]  Yukyee Leung,et al.  A Multiple-Filter-Multiple-Wrapper Approach to Gene Selection and Microarray Data Classification , 2010, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Ana Carolina Lorena,et al.  On the Complexity of Gene Expression Classification Data Sets , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[4]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  Insuk Sohn,et al.  Informative transcription factor selection using support vector machine-based generalized approximate cross validation criteria , 2009, Comput. Stat. Data Anal..

[7]  Insuk Sohn,et al.  Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine , 2009, Comput. Stat. Data Anal..

[8]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[9]  Jing Shen,et al.  A Novel Discrete Particle Swarm Optimization Algorithm for Microarray Data-Based Tumor Marker Gene Selection , 2008, 2008 International Conference on Computer Science and Software Engineering.

[10]  A.K.C. Wong,et al.  Attribute clustering for grouping, selection, and classification of gene expression data , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  M. Čuperlović-Culf,et al.  Determination of tumour marker genes from gene expression data. , 2005, Drug discovery today.

[12]  Chenn-Jung Huang,et al.  A comparative study of feature selection methods for probabilistic neural networks in cancer classification , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[13]  Ajith Abraham,et al.  Computational Intelligence in Solving Bioinformatics Problems: Reviews, Perspectives, and Challenges , 2008, Computational Intelligence in Biomedicine and Bioinformatics.

[14]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[15]  Prabhas Chongstitvatana,et al.  Selecting Informative Genes from Microarray Data for Cancer Classification with Genetic Programming Classifier Using K-Means Clustering and SNR Ranking , 2007, 2007 Frontiers in the Convergence of Bioscience and Information Technologies.

[16]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[17]  Chen-An Tsai,et al.  Testing for differentially expressed genes with microarray data. , 2003, Nucleic acids research.