Sparse Bayesian approach for feature selection

This paper employs sparse Bayesian approach to enable the Probabilistic Classification Vector Machine (PCVM) to select a relevant subset of features. Because of probabilistic outputs and the ability to automatically optimize the regularization items, the sparse Bayesian framework has shown great advantages in real-world applications. However, the Gaussian priors that introduce the same prior to different classes may lead to instability in the classifications. An improved Gaussian prior, whose sign is determined by the class label, is adopt in PCVM. In this paper, we present a joint classifier and feature learning algorithm: Feature Selection Probabilistic Classification Vector Machine (FPCVM). The improved Gaussian priors, named as truncated Gaussian prior, are introduced into the feature space for feature selection, and into the sample space to generate sparsity to the weight parameters, respectively. The expectation-maximization (EM) algorithm is employed to obtain a maximum a posteriori (MAP) estimation of these parameters. In experiments, both the accuracy of classification and performance of feature selection are evaluated on synthetic datasets, benchmark datasets and high-dimensional gene expression datasets.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Huanhuan Chen,et al.  Efficient Probabilistic Classification Vector Machine With Incremental Basis Function Selection , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[5]  Lipo Wang,et al.  A Modified T-test Feature Selection Method and Its Application on the HapMap Genotype Data , 2008, Genom. Proteom. Bioinform..

[6]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[7]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[8]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[9]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[10]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[11]  Lawrence Carin,et al.  A Bayesian approach to joint feature selection and classifier design , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mário A. T. Figueiredo Adaptive Sparseness for Supervised Learning , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Huanhuan Chen,et al.  Probabilistic Classification Vector Machines , 2009, IEEE Transactions on Neural Networks.

[14]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[15]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[16]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.