Ensemble-based active learning using fuzzy-rough approach for cancer sample classification

Abstract Background and Objective: Classification of cancer from gene expression data is one of the major research areas in the field of machine learning and medical science. Generally, conventional supervised methods are not able to produce desired classification accuracy due to inadequate training samples present in gene expression data to train the system. Ensemble-based active learning technique in this situation can be effective as it determines few informative samples by all the base classifiers and ensemble the decisions of all the base classifiers to get the most informative samples. Most informative samples are labeled by the subject experts and those are added to the training set, which can improve the classification accuracy. Method: We propose a novel ensemble-based active learning using fuzzy-rough approach for cancer sample classification from microarray gene expression data. The proposed method is able to deal with the uncertainty, overlap and indiscernibility usually present in the subtype classes of the gene expression data and can improve the accuracy of the individual base classifier in presence of limited training samples. Results: The proposed method is validated using eight microarray gene expression datasets. The performance of the proposed method in terms of classification accuracy, precision, recall, F 1 -measures and kappa is compared with six other methods. The improvements in accuracy achieved by the proposed method compared to its nearest competitive methods are 2.96%, 9.34%, 0.93%, 3.69%, 7.2% and 4.53% respectively for Colon cancer, Prostate cancer, SRBCT, Ovarian cancer, DLBCL and Central nervous system datasets. Results of the paired t -test justify the statistical relevance of the results in favor of the proposed method for most of the datasets. Conclusion: The proposed method is an effective general purpose ensemble-based active learning adopting the fuzzy-rough concept and therefore can be applied for other classification problem in future.

[1]  Christoph Rensing,et al.  Combining Active and Ensemble Learning for Efficient Classification of Web Documents , 2014, Polibits.

[2]  Jiawei Han,et al.  Cancer classification using gene expression data , 2003, Inf. Syst..

[3]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[4]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[6]  John Quackenbush,et al.  Genesis: cluster analysis of microarray data , 2002, Bioinform..

[7]  Fuyuan Xiao,et al.  EFMCDM: Evidential Fuzzy Multicriteria Decision Making Based on Belief Entropy , 2020, IEEE Transactions on Fuzzy Systems.

[8]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  Ujjwal Maulik,et al.  Fuzzy Preference Based Feature Selection and Semisupervised SVM for Cancer Classification , 2014, IEEE Transactions on NanoBioscience.

[10]  Anindya Halder,et al.  Ensemble based Fuzzy-Rough Nearest Neighbor Approach for Classification of Cancer from Microarray data , 2019 .

[11]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[12]  Anindya Halder,et al.  Semi-supervised fuzzy K-NN for cancer classification from microarray gene expression data , 2014, 2014 First International Conference on Automation, Control, Energy and Systems (ACES).

[13]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[14]  Anindya Halder,et al.  Active Learning Using Fuzzy k-NN for Cancer Classification from Microarray Gene Expression Data , 2015 .

[15]  Yunli Wang,et al.  Semi-supervised consensus clustering for gene expression data analysis , 2014, BioData Mining.

[16]  S. Swamynathan,et al.  A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data , 2013, Frontiers of Computer Science.

[17]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..

[19]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[20]  Chris Cornelis,et al.  Fuzzy-rough nearest neighbour classification and prediction , 2011, Theor. Comput. Sci..

[21]  Bing Zhang,et al.  Semi-supervised learning improves gene expression-based prediction of cancer recurrence , 2011, Bioinform..

[22]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[23]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[24]  Yunli Wang,et al.  Utilization of gene ontology in semi-supervised clustering , 2011, 2011 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).

[25]  Ying Liu,et al.  Active Learning with Support Vector Machine Applied to Gene Expression Data for Cancer Classification , 2004, J. Chem. Inf. Model..

[26]  Alessandro Guffanti,et al.  AntiHunter: searching BLAST output for EST antisense transcripts , 2004, Bioinform..

[27]  Mainak Biswas,et al.  Statistical characterization and classification of colon microarray gene expression data using multiple machine learning paradigms , 2019, Comput. Methods Programs Biomed..

[28]  Sankar K. Pal,et al.  RFCM: A Hybrid Clustering Algorithm Using Rough and Fuzzy Sets , 2007, Fundam. Informaticae.

[29]  Chee Peng Lim,et al.  A Modified Two-Stage SVM-RFE Model for Cancer Classification Using Microarray Data , 2011, ICONIP.

[30]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[31]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  Anindya Halder,et al.  Active learning using rough fuzzy classifier for cancer prediction from microarray gene expression data , 2019, J. Biomed. Informatics.

[33]  David G. Stork,et al.  Pattern Classification , 1973 .

[34]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[35]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[36]  Emmanuel Barillot,et al.  Classification of microarray data using gene networks , 2007, BMC Bioinformatics.

[37]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[38]  Gernot A. Fink,et al.  Active Learning of Ensemble Classifiers for Gesture Recognition , 2012, DAGM/OAGM Symposium.

[39]  Fuyuan Xiao,et al.  A Distance Measure for Intuitionistic Fuzzy Sets and Its Application to Pattern Classification Problems , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[40]  Peter Bühlmann,et al.  Supervised clustering of genes , 2002, Genome Biology.

[41]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[42]  Jemal H. Abawajy,et al.  Workflow scheduling in distributed systems under fuzzy environment , 2019, J. Intell. Fuzzy Syst..

[43]  Anindya Halder,et al.  Active Learning Using Fuzzy-Rough Nearest Neighbor Classifier for Cancer Prediction from Microarray Gene Expression Data , 2020, Int. J. Pattern Recognit. Artif. Intell..

[44]  Dimitrios Vogiatzis,et al.  Active learning for microarray data , 2008, Int. J. Approx. Reason..

[45]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[46]  Ashish Ghosh,et al.  Aggregation pheromone metaphor for semi-supervised classification , 2013, Pattern Recognit..

[47]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Alireza Osareh,et al.  An Efficient Ensemble Learning Method for Gene Microarray Classification , 2013, BioMed research international.

[49]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[50]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[51]  Dimitrios K. Iakovidis,et al.  Microarray-MD: A system for exploratory analysis of microarray gene expression data , 2006, Comput. Methods Programs Biomed..

[52]  Jun Wu,et al.  A deep learning-based multi-model ensemble method for cancer prediction , 2018, Comput. Methods Programs Biomed..