Feature Ranking using Robust Fuzzy Score Function for Gene Expression Data

Feature engineering plays a vital role in selecting relevant features that have maximum predictive value. In this paper, we have proposed a Gaussian fuzzy score function to rank the features in descending order of their score values. The mean and variance of the Gaussian fuzzy score function are determined using mean of k-middle. The mean of k-middle plays an important role to determine the complementary information of the features in the dataset. The features selected using proposed feature ranking method are fed to four widely used classifiers, i.e., linear kernel support vector machine, radial basis function kernel support vector machine, random forest and softmax classifier respectively. To show the effectiveness of the proposed approach, we compared its performance with that of state-ofthe-art methods on five large-scale gene expression datasets.

[1]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[2]  Shuichi Tsutsumi,et al.  Global gene expression analysis of gastric cancer by oligonucleotide microarrays. , 2002, Cancer research.

[3]  X. Chen,et al.  Random forests for genomic data analysis. , 2012, Genomics.

[4]  Marco Cristani,et al.  Infinite Feature Selection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Perica Strbac,et al.  Toward optimal feature selection using ranking methods and classification algorithms , 2011 .

[6]  Nishchal K. Verma,et al.  Adaptive Type-2 Fuzzy Approach for Filtering Salt and Pepper Noise in Grayscale Images , 2018, IEEE Transactions on Fuzzy Systems.

[7]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[8]  Vikas Singh,et al.  A Type-2 Fuzzy Systems Approach for Clustering-Based Identification of a T-S Regression Model , 2019 .

[9]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[10]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[11]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[12]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[13]  Rahul Kumar Sevakula,et al.  Assessing Generalization Ability of Majority Vote Point Classifiers , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[14]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[15]  Yan Cui,et al.  Type-2 Fuzzy PCA Approach in Extracting Salient Features for Molecular Cancer Diagnostics and Prognostics , 2019, IEEE Transactions on NanoBioscience.

[16]  Robert Sabourin,et al.  The Multiclass ROC Front method for cost-sensitive classification , 2016, Pattern Recognit..

[17]  Ian J. Jacobs,et al.  SCREENING FOR OVARIAN CANCER : A PILOT RANDOMISED CONTROLLED TRIAL , 1999 .

[18]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[19]  Yan Cui,et al.  Layerwise feature selection in Stacked Sparse Auto-Encoder for tumor type prediction , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[20]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[21]  Nishchal K. Verma,et al.  Fuzzy Inference Network with Mamdani Fuzzy Inference System , 2018, Computational Intelligence: Theories, Applications and Future Directions - Volume I.

[22]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[23]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[24]  Nishchal K. Verma,et al.  Support Vector Machine for Large Databases as Classifier , 2012, SEMCCO.

[25]  Mohd Saberi Mohamad,et al.  Random forest for gene selection and microarray data classification , 2011, Bioinformation.

[26]  Nishchal K. Verma,et al.  Comparative analysis of Gaussian mixture model, logistic regression and random forest for big data classification using map reduce , 2016, 2016 11th International Conference on Industrial and Information Systems (ICIIS).

[27]  Torben F. Ørntoft,et al.  Identifying distinct classes of bladder carcinoma using microarrays , 2003, Nature Genetics.

[28]  Yan Cui,et al.  Transfer Learning for Molecular Cancer Classification Using Deep Neural Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Kleanthi Lakiotaki,et al.  Efficient feature selection on gene expression data: Which algorithm to use? , 2018, bioRxiv.

[30]  Doheon Lee,et al.  Detecting clusters of different geometrical shapes in microarray gene expression data , 2005, Bioinform..

[31]  Jin-Kao Hao,et al.  Advances in metaheuristics for gene selection and classification of microarray data , 2010, Briefings Bioinform..

[32]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[33]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[34]  Yan Cui,et al.  Optimal Feature Selection using Fuzzy Combination of Feature Subset for Transcriptome Data , 2018, 2018 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[35]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .