Active learning using rough fuzzy classifier for cancer prediction from microarray gene expression data

Cancer classification from microarray gene expression data is one of the important areas of research in the field of computational biology and bioinformatics. Traditional supervised techniques often fail to produce desired accuracy as the number of clinically labeled patterns are very less. In such situation, active learning technique can play an important role as it computationally selects only few most informative (confusing) samples to be labeled by the experts and are added to the training set which inturn can improve the accuracy of the prediction. In this work a novel active learning method using rough-fuzzy classifier (ALRFC) is proposed for cancer sample classification using gene expression data. The proposed technique can handle uncertainty, overlappingness, and indiscernibility usually present in the subtype classes of the gene expression data. The proposed algorithm is tested using different publicly available benchmark cancer datasets and the performance is compared of the proposed method with three other active learning techniques, one semi-supervised classification algorithm, and two (non-active) supervised counterpart learning techniques in terms of prediction accuracy, precision, recall, F1-measures and kappa. Superiority of the proposed method for cancer prediction over the other state-of-art techniques is established from the experimental results. Statistical significance of the better results achieved by the proposed method (in comparison to other methods) is also confirmed from the paired t-test results for most of the datasets.

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[3]  Taghi M. Khoshgoftaar,et al.  Active learning with neural networks for intrusion detection , 2010, 2010 IEEE International Conference on Information Reuse & Integration.

[4]  C. Mathers,et al.  Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012 , 2015, International journal of cancer.

[5]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[6]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, CVPR.

[7]  John Quackenbush,et al.  Genesis: cluster analysis of microarray data , 2002, Bioinform..

[8]  Bing Zhang,et al.  Semi-supervised learning improves gene expression-based prediction of cancer recurrence , 2011, Bioinform..

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Ujjwal Maulik,et al.  Fuzzy Preference Based Feature Selection and Semisupervised SVM for Cancer Classification , 2014, IEEE Transactions on NanoBioscience.

[11]  Dimitrios Vogiatzis,et al.  Active learning for microarray data , 2008, Int. J. Approx. Reason..

[12]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[13]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[14]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[15]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[16]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[17]  Anindya Halder,et al.  Semi-supervised fuzzy K-NN for cancer classification from microarray gene expression data , 2014, 2014 First International Conference on Automation, Control, Energy and Systems (ACES).

[18]  Alessandro Guffanti,et al.  AntiHunter: searching BLAST output for EST antisense transcripts , 2004, Bioinform..

[19]  Anindya Halder,et al.  Active Learning Using Fuzzy k-NN for Cancer Classification from Microarray Gene Expression Data , 2015 .

[20]  Ashish Ghosh,et al.  Aggregation pheromone metaphor for semi-supervised classification , 2013, Pattern Recognit..

[21]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[22]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[23]  Chris Cornelis,et al.  Fuzzy-rough nearest neighbour classification and prediction , 2011, Theor. Comput. Sci..

[24]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Antonio J. Plaza,et al.  Hyperspectral Image Segmentation Using a New Bayesian Approach With Active Learning , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[26]  Lawrence O. Hall,et al.  Active learning to recognize multiple types of plankton , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[27]  Yunli Wang,et al.  Semi-supervised consensus clustering for gene expression data analysis , 2014, BioData Mining.

[28]  William J. Emery,et al.  Active Learning Methods for Remote Sensing Image Classification , 2009, IEEE Transactions on Geoscience and Remote Sensing.

[29]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[30]  Bernhard Schölkopf,et al.  Semi-Supervised Learning (Adaptive Computation and Machine Learning) , 2006 .

[31]  Yunli Wang,et al.  Utilization of gene ontology in semi-supervised clustering , 2011, 2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[32]  Chee Peng Lim,et al.  A Modified Two-Stage SVM-RFE Model for Cancer Classification Using Microarray Data , 2011, ICONIP.

[33]  Dunja Mladenic,et al.  Text Classification with Active Learning , 2005, GfKl.

[34]  Li Guo,et al.  An active learning based TCM-KNN algorithm for supervised network intrusion detection , 2007, Comput. Secur..

[35]  Peter Bühlmann,et al.  Supervised clustering of genes , 2002, Genome Biology.

[36]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[37]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[38]  Erwin Kreyszig,et al.  Introductory Mathematical Statistics. , 1970 .

[39]  S. Swamynathan,et al.  A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data , 2013, Frontiers of Computer Science.

[40]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..