Identification of Biomarker on Biological and Gene Expression data using Fuzzy Preference Based Rough Set

Abstract Cancer is fast becoming an alarming cause of human death. However, it has been reported that if the disease is detected at an early stage, diagnosed, treated appropriately, the patient has better chances of survival long life. Machine learning technique with feature-selection contributes greatly to the detecting of cancer, because an efficient feature-selection method can remove redundant features. In this paper, a Fuzzy Preference-Based Rough Set (FPRS) blended with Support Vector Machine (SVM) has been applied in order to predict cancer biomarkers for biological and gene expression datasets. Biomarkers are determined by deploying three models of FPRS, namely, Fuzzy Upward Consistency (FUC), Fuzzy Downward Consistency (FLC), and Fuzzy Global Consistency (FGC). The efficiency of the three models with SVM on five datasets is exhibited, and the biomarkers that have been identified from FUC models have been reported.

[1]  Zdzisław Pawlak,et al.  Rough set theory and its applications , 2002, Journal of Telecommunications and Information Technology.

[2]  Sai Prasad Potharaju,et al.  Distributed feature selection (DFS) strategy for microarray gene expression data to improve the classification performance , 2018, Clinical Epidemiology and Global Health.

[3]  H. Khanna Nehemiah,et al.  Hybrid Dimension Reduction Techniques with Genetic Algorithm and Neural Network for Classifying Leukemia Gene Expression Data , 2016 .

[4]  Xiaosheng Wang,et al.  A Robust Gene Selection Method for Microarray-based Cancer Classification , 2010, Cancer informatics.

[5]  T. Santhanam,et al.  BREAST CANCER DIAGNOSIS USING MACHINE LEARNING ALGORITHMS -A SURVEY , 2013 .

[6]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[7]  Santanu Ghorai,et al.  Gene Expression Data Classification by VVRKFA , 2012 .

[8]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[9]  P. Jaganathan,et al.  A Comparative Study of Improved F-Score with Support Vector Machine and RBF Network for Breast Cancer Classification , 2012 .

[10]  Sadok Ben Yahia,et al.  A new FCA-based method for identifying biclusters in gene expression data , 2018, Int. J. Mach. Learn. Cybern..

[11]  Ujjwal Maulik,et al.  Fuzzy Preference Based Feature Selection and Semisupervised SVM for Cancer Classification , 2014, IEEE Transactions on NanoBioscience.

[12]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[13]  Ujjwal Maulik,et al.  Identifying Cancer Biomarkers From Microarray Data Using Feature Selection and Semisupervised Learning , 2014, IEEE Journal of Translational Engineering in Health and Medicine.

[14]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[15]  Junhui Zhao,et al.  Dissolved Gases Forecasting Based on Wavelet Least Squares Support Vector Regression and Imperialist Competition Algorithm for Assessing Incipient Faults of Transformer Polymer Insulation , 2019, Polymers.

[16]  Jingjing Liu,et al.  Cancer classification based on microarray gene expression data using a principal component accumulation method , 2011 .

[17]  Qinghua Hu,et al.  Fuzzy preference based rough sets , 2010, Inf. Sci..

[18]  Juan Humberto Sossa Azuela,et al.  Improving pattern classification of DNA microarray data by using PCA and logistic regression , 2016, Intell. Data Anal..

[19]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[20]  De-Shuang Huang,et al.  A Gene Selection Method for Microarray Data Based on Binary PSO Encoding Gene-to-Class Sensitivity Information , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Peter A. Bandettini,et al.  Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images , 2012, NeuroImage.

[22]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[23]  Shiv Shakti Shrivastava,et al.  An Overview on Data Mining Approach on Breast Cancer data , 2013 .

[24]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[25]  Madhumita Panda,et al.  Performance Comparison of Genetic Algorithm , Particle Swarm Optimization and Simulated Annealing Applied to TSP , 2018 .

[26]  A. Kalloo,et al.  Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images , 2018, Journal of the American Academy of Dermatology.

[27]  Sraban Kumar Mohanty,et al.  DK-means: a deterministic K-means clustering algorithm for gene expression analysis , 2017, Pattern Analysis and Applications.

[28]  Dinesh Kumar,et al.  A Novel Approach for Classification on Breast Cancer Data Set , 2015 .

[29]  Tan Yee Fan,et al.  A Tutorial on Support Vector Machine , 2009 .

[30]  Vipin Kumar,et al.  Feature Selection: A literature Review , 2014, Smart Comput. Rev..

[31]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[32]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[33]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[34]  K R Usha Rani,et al.  Ensemble Decision Making System for Breast Cancer Data , 2012 .

[35]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[36]  Manpreet Kaur,et al.  An approach for feature selection using local searching and global optimization techniques , 2017, Neural Computing and Applications.

[37]  Yixuan Li,et al.  Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction , 2018 .

[38]  Prasad S. Thenkabail Remote Sensing Open Access Journal: Increasing Impact through Quality Publications , 2014, Remote. Sens..

[39]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[40]  Chih-Jen Lin,et al.  A tutorial on?-support vector machines , 2005 .

[41]  Amol P. Pande,et al.  Neural Network Aided Breast Cancer Detection and Diagnosis Using Support Vector Machine , 2006 .

[42]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[43]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[44]  G. I. Salama,et al.  Experimental comparison of classifiers for breast cancer diagnosis , 2012, 2012 Seventh International Conference on Computer Engineering & Systems (ICCES).

[45]  Tsuyoshi Murata,et al.  {m , 1934, ACML.