论文信息 - Local feature selection using Gaussian process regression

Local feature selection using Gaussian process regression

Most feature selection methods determine a global subset of features, where all data instances are projected in order to improve classification accuracy. An attractive alternative solution is to adaptively find a local subset of features for each data instance, such that, the classification of each instance is performed according to its own selective subspace. This paper presents a novel application of Gaussian Processes GPs that improves classification performance by learning a set of functions that quantify the discriminative power of each feature. Specifically, GP regressions are used to build for each available feature a function that estimates its discriminative properties over all its input space. Afterwards, by locally joining these regressions it is possible to obtain a discriminative subspace for any position of the input space. New instances are then classified by using a K-NN classifier that operates in the local subspaces. Experimental results show that by using local discriminative subspaces, we are able to reach higher levels of classification accuracy than alternative state-of-the-art feature selection approaches.

Alvaro Soto | Karim Pichara | Á. Soto | K. Pichara

[1] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[2] Yoram Singer,et al. Online and batch learning of pseudo-metrics , 2004, ICML.

[3] J. Rice. Mathematical Statistics and Data Analysis , 1988 .

[4] Daphne Koller,et al. Toward Optimal Feature Selection , 1996, ICML.

[5] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6] Susan T. Dumais,et al. Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[7] Pat Langley,et al. Induction of Selective Bayesian Classifiers , 1994, UAI.

[8] Isabelle Guyon,et al. An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[9] R. Tibshirani. Regression Shrinkage and Selection via the Lasso , 1996 .

[10] Larry A. Rendell,et al. The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[11] David Maxwell Chickering,et al. Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[12] Xiaofei He,et al. Locality Preserving Projections , 2003, NIPS.

[13] Richard Bellman,et al. Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[14] Huan Liu,et al. A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[15] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[16] Peter J. Rousseeuw,et al. Clustering by means of medoids , 1987 .

[17] Michael J. Pazzani,et al. Searching for Dependencies in Bayesian Classifiers , 1995, AISTATS.

[18] Jason Weston,et al. Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[19] Alex Pentland,et al. View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[20] Josef Kittler,et al. Locally linear discriminant analysis for multimodally distributed classes for face recognition with a single model image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] M. D. McKay,et al. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code , 2000 .

[22] Marko Robnik-Sikonja,et al. An adaptation of Relief for attribute estimation in regression , 1997, ICML.

[23] Tomer Hertz,et al. Learning a Mahalanobis Metric from Equivalence Constraints , 2005, J. Mach. Learn. Res..

[24] L. A. Smith,et al. Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[25] Kilian Q. Weinberger,et al. Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[26] David B. Skalak,et al. Prototype and Feature Selection by Sampling and Random Mutation Hill Climbing Algorithms , 1994, ICML.

[27] Igor Kononenko,et al. Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[28] Huan Liu,et al. Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[29] Misha Pavel,et al. Adjustment Learning and Relevant Component Analysis , 2002, ECCV.

[30] Dieter Filbert,et al. Automated flaw detection in aluminum castings based on the tracking of potential defects in a radioscopic image sequence , 2002, IEEE Trans. Robotics Autom..

[31] D. Mackay,et al. Introduction to Gaussian processes , 1998 .

[32] Michael I. Jordan,et al. Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[33] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[34] Geoffrey E. Hinton,et al. Neighbourhood Components Analysis , 2004, NIPS.

[35] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[36] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[37] I. Jolliffe. Principal Component Analysis , 2002 .

[38] Paul S. Bradley,et al. Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[39] Masashi Sugiyama,et al. Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[40] Majid Nili Ahmadabadi,et al. Optimal Local Basis: A Reinforcement Learning Approach for Face Recognition , 2009, International Journal of Computer Vision.

[41] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..