A feature selection method based on multiple kernel learning with expression profiles of different types

BackgroundWith the development of high-throughput technology, the researchers can acquire large number of expression data with different types from several public databases. Because most of these data have small number of samples and hundreds or thousands features, how to extract informative features from expression data effectively and robustly using feature selection technique is challenging and crucial. So far, a mass of many feature selection approaches have been proposed and applied to analyse expression data of different types. However, most of these methods only are limited to measure the performances on one single type of expression data by accuracy or error rate of classification.ResultsIn this article, we propose a hybrid feature selection method based on Multiple Kernel Learning (MKL) and evaluate the performance on expression datasets of different types. Firstly, the relevance between features and classifying samples is measured by using the optimizing function of MKL. In this step, an iterative gradient descent process is used to perform the optimization both on the parameters of Support Vector Machine (SVM) and kernel confidence. Then, a set of relevant features is selected by sorting the optimizing function of each feature. Furthermore, we apply an embedded scheme of forward selection to detect the compact feature subsets from the relevant feature set.ConclusionsWe not only compare the classification accuracy with other methods, but also compare the stability, similarity and consistency of different algorithms. The proposed method has a satisfactory capability of feature selection for analysing expression datasets of different types using different performance measurements.

[1]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[2]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[3]  Filippo Menczer,et al.  Feature selection in data mining , 2003 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Louise C. Showe,et al.  Recursive Cluster Elimination (RCE) for classification and feature selection from gene expression data , 2007, BMC Bioinformatics.

[7]  Taesung Park,et al.  Robust imputation method for missing values in microarray data , 2007, BMC Bioinformatics.

[8]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[9]  Yanqing Zhang,et al.  Development of Two-Stage SVM-RFE Gene Selection Strategy for Microarray Expression Data Analysis , 2007, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Jianping Li,et al.  A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue , 2007, Artif. Intell. Medicine.

[11]  Jaakko Astola,et al.  Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations , 2009, BMC Bioinformatics.

[12]  Ron Shamir,et al.  SlimPLS: A Method for Feature Selection in Gene Expression-Based Disease Classification , 2009, PloS one.

[13]  Hsueh-Wei Chang,et al.  A two-stage feature selection method for gene expression data. , 2009, Omics : a journal of integrative biology.

[14]  S. Niijima,et al.  Laplacian Linear Discriminant Analysis Approach to Unsupervised Feature Selection , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[15]  Jonathan M. Garibaldi,et al.  ArrayMining: a modular web-application for microarray analysis combining ensemble and consensus methods with cross-study normalization , 2009, BMC Bioinformatics.

[16]  Gavin Sherlock,et al.  Implementation of GenePattern within the Stanford Microarray Database , 2008, Nucleic Acids Res..

[17]  J.C. Rajapakse,et al.  SVM-RFE With MRMR Filter for Gene Selection , 2010, IEEE Transactions on NanoBioscience.

[18]  Jean-Philippe Vert,et al.  The Influence of Feature Selection Methods on Accuracy, Stability and Interpretability of Molecular Signatures , 2011, PloS one.

[19]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[20]  U. Maulik,et al.  An SVM-Wrapped Multiobjective Evolutionary Feature Selection Approach for Identifying Cancer-MicroRNA Markers , 2013, IEEE Transactions on NanoBioscience.

[21]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[22]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[23]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Miron B. Kursa,et al.  Robustness of Random Forest-based gene selection methods , 2013, BMC Bioinformatics.

[25]  Chen Zhang,et al.  A novel multi-stage feature selection method for microarray expression data analysis , 2013, Int. J. Data Min. Bioinform..

[26]  Ivor W. Tsang,et al.  A Feature Selection Method for Multivariate Performance Measures , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[28]  Michael K. Ng,et al.  Feature weight estimation for gene selection: a local hyperlinear learning approach , 2014, BMC Bioinformatics.

[29]  Ujjwal Maulik,et al.  Fuzzy Preference Based Feature Selection and Semisupervised SVM for Cancer Classification , 2014, IEEE Transactions on NanoBioscience.

[30]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[31]  Colin Campbell,et al.  A pathway-based data integration framework for prediction of disease progression , 2013, Bioinform..

[32]  Ivor W. Tsang,et al.  Towards ultrahigh dimensional feature selection for big data , 2012, J. Mach. Learn. Res..

[33]  Yanchun Liang,et al.  A novel filter feature selection method for paired microarray expression data analysis , 2015, Int. J. Data Min. Bioinform..

[34]  Jennie Si,et al.  FREL: A Stable Feature Selection Algorithm , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.