Optimal Feature Selection using Fuzzy Combination of Feature Subset for Transcriptome Data

Applying machine learning algorithms directly on high dimensional datasets, like those encountered in transcriptome analysis, may lead to high time complexity and low performance of learning models, especially when the number of samples is small compared to the dimensionality. Selecting the optimal set of features then becomes an essential task for such datasets. Filter methods are one of the main class of techniques used for feature selection wherein a score is assigned to features based on criteria such as information gain, statistical measures or similarity based measures and then selects the best scored features. Using filter methods on the complete dataset results in features that have good performance over the dataset but might perform poorly in certain regions of the data, which affects accuracy for data points of those regions. To overcome this degradation in performance, we propose two novel methods to assign a robust score by using the fuzzy combination of the region-specific optimal feature subsets obtained using a standard feature selection algorithm (we use mRMR for this paper).We compare the result with state-of-the-art feature selection algorithm, mRMR (Minimum Redundancy Maximum Relevance) in the terms of accuracy on certain standard datasets.

[1]  F. Fleuret Fast Binary Feature Selection with Conditional Mutual Information , 2004, J. Mach. Learn. Res..

[2]  Francisco Azuaje,et al.  An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors , 2006, BMC Medical Informatics Decis. Mak..

[3]  Nishchal K. Verma,et al.  Adaptive Type-2 Fuzzy Approach for Filtering Salt and Pepper Noise in Grayscale Images , 2018, IEEE Transactions on Fuzzy Systems.

[4]  Jong Won Yun,et al.  Time-course microarrays reveal early activation of the immune transcriptome and adipokine dysregulation leads to fibrosis in visceral adipose depots during diet-induced obesity , 2012, BMC Genomics.

[5]  Hong Peng,et al.  Improving the Computational Efficiency of Recursive Cluster Elimination for Gene Selection , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[6]  Zili Zhang,et al.  A multi-filter enhanced genetic ensemble system for gene selection and sample classification of microarray data , 2010, BMC Bioinformatics.

[7]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[8]  Shu-Yuan Chen,et al.  Classifying subtypes of acute lymphoblastic leukemia using silhouette statistics and genetic algorithms. , 2013, Gene.

[9]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[10]  Estevam R. Hruschka,et al.  Feature Selection by Bayesian Networks , 2004, Canadian Conference on AI.

[11]  P. Melin,et al.  Hybrid intelligent system for pattern recognition , 2007 .

[12]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[13]  Nishchal K. Verma,et al.  From a Gaussian Mixture Model to Nonadditive Fuzzy Systems , 2007, IEEE Transactions on Fuzzy Systems.

[14]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[15]  Patrick Tan,et al.  Genetic algorithms applied to multi-class prediction for the analysis of gene expression data , 2003, Bioinform..

[16]  J. Stuart Aitken,et al.  Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes , 2005, BMC Bioinformatics.

[17]  C. Rosenow,et al.  Monitoring gene expression using DNA microarrays. , 2000, Current opinion in microbiology.

[18]  Kewei Cheng,et al.  Feature Selection , 2016, ACM Comput. Surv..

[19]  Spiridon D. Likothanassis,et al.  YamiPred: A Novel Evolutionary Method for Predicting Pre-miRNAs and Selecting Relevant Features , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[21]  S. Wright THE INTERPRETATION OF POPULATION STRUCTURE BY F‐STATISTICS WITH SPECIAL REGARD TO SYSTEMS OF MATING , 1965 .

[22]  Robert LIN,et al.  NOTE ON FUZZY SETS , 2014 .

[23]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[24]  L. Cooper,et al.  Sequential Search: A Method for Solving Constrained Optimization Problems , 1965, JACM.

[25]  Lucila Ohno-Machado,et al.  A primer on gene expression and microarrays for machine learning researchers , 2004, J. Biomed. Informatics.

[26]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[27]  Ujjwal Maulik,et al.  Gene-Expression-Based Cancer Subtypes Prediction Through Feature Selection and Transductive SVM , 2013, IEEE Transactions on Biomedical Engineering.

[28]  Concha Bielza,et al.  Machine Learning in Bioinformatics , 2008, Encyclopedia of Database Systems.

[29]  S. Merler,et al.  Semisupervised learning for molecular profiling , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  F. Azuaje,et al.  Multiple SVM-RFE for gene selection in cancer classification with expression data , 2005, IEEE Transactions on NanoBioscience.

[31]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[32]  Rebecca W Doerge,et al.  An Empirical Bayesian Method for Estimating Biological Networks from Temporal Microarray Data , 2010, Statistical applications in genetics and molecular biology.

[33]  T. Ross Fuzzy Logic with Engineering Applications , 1994 .

[34]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Rahul Kumar Sevakula,et al.  Compounding General Purpose Membership Functions for Fuzzy Support Vector Machine Under Noisy Environment , 2017, IEEE Transactions on Fuzzy Systems.

[36]  Duncan Fyfe Gillies,et al.  A Review of Feature Selection and Feature Extraction Methods Applied on Microarray Data , 2015, Adv. Bioinformatics.

[37]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[38]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Yan Cui,et al.  Transfer Learning for Molecular Cancer Classification Using Deep Neural Networks , 2019, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[41]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .