Cancer Classification from Gene Expression data using Fuzzy-Rough techniques An Empirical Study

Cancer classification from gene expression data is one of the most challenging research areas in the field of computation biology, bioinformatics and machine learning as the number of clinically labeled samples are very few compared to number of genes present. Also the cancer subtype classes are often highly overlapping, imprecise, and indiscernible in nature. Various machine techniques have been developed and applied on gene expression data for cancer sample classification. Here in this article, an empirical study of cancer classification from microarray gene expression data is performed using fuzzyrough nearest neighbour techniques where performance of four different types of classifiers viz., Fuzzy nearest neighbour, Fuzzy-rough nearest neighbour, Vaguely quantified fuzzy-rough nearest neighbour and Ordered weighted average based fuzzyrough nearest neighbor are investigated. The experiments are carried out on eight publicly available real life microarray gene expression cancer datasets. To assess the results of the classifiers percentage accuracy, precision, recall, macro averaged F1 measure, micro averaged F1 measure and kappa are used. The comparative study of the investigated methods is also done using paired t-test. Fuzzy-rough nearest neighbour method is found to be better for most of the data sets for cancer classification. Keywords—Cancer Classification, Fuzzy-Rough set, Vaguely Quantified, Ordered Weighted Average, Microarray Gene Expression data.

[1]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[2]  Anna Maria Radzikowska,et al.  A comparative study of fuzzy rough sets , 2002, Fuzzy Sets Syst..

[3]  Minrui Fei,et al.  A novel forward gene selection algorithm for microarray data , 2014, Neurocomputing.

[4]  Peter Bühlmann,et al.  Boosting for Tumor Classification with Gene Expression Data , 2003, Bioinform..

[5]  Theresa Beaubouef,et al.  Rough Sets , 2019, Lecture Notes in Computer Science.

[6]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decision-making , 1988 .

[7]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[8]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[9]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[10]  Chris Cornelis,et al.  A New Approach to Fuzzy-Rough Nearest Neighbour Classification , 2008, RSCTC.

[11]  S. Swamynathan,et al.  A semi-supervised hierarchical approach: two-dimensional clustering of microarray gene expression data , 2013, Frontiers of Computer Science.

[12]  Ashish Ghosh,et al.  Aggregation pheromone metaphor for semi-supervised classification , 2013, Pattern Recognit..

[13]  Anindya Halder,et al.  Semi-supervised fuzzy K-NN for cancer classification from microarray gene expression data , 2014, 2014 First International Conference on Automation, Control, Energy and Systems (ACES).

[14]  Marcel Dettling,et al.  BagBoosting for tumor classification with gene expression data , 2004, Bioinform..