The K-nearest-neighbor decision rule assigns an object of unknown class to the plurality class among the K labeled \training" objects that are closest to it. Closeness is usually de ̄ned in terms of a metric distance on the Euclidean space with the input measurement variables as axes. The metric chosen to de ̄ne this distance can strongly e®ect performance. An optimal choice depends on the problem at hand as characterized by the respective class distributions on the input measurement space, and within a given problem, on the location of the unknown object in that space. In this paper new types of K-nearest-neighbor procedures are described that estimate the local relevance of each input variable, or their linear combinations, for each individual point to be classi ̄ed. This information is then used to separately customize the metric used to de ̄ne distance from that object in ̄nding its nearest neighbors. These procedures are a hybrid between regular K-nearest-neighbor methods and tree-structured recursive partitioning techniques popular in statistics and machine learning.
[1]
E. Parzen.
On Estimation of a Probability Density Function and Mode
,
1962
.
[2]
J. Morgan,et al.
Problems in the Analysis of Survey Data, and a Proposal
,
1963
.
[3]
Jerome H. Friedman,et al.
A Recursive Partitioning Decision Rule for Nonparametric Classification
,
1977,
IEEE Transactions on Computers.
[4]
Eric R. Ziegel,et al.
Data: A Collection of Problems From Many Fields for the Student and Research Worker
,
1987
.
[5]
J. L. Hodges,et al.
Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties
,
1989
.
[6]
G. McLachlan.
Discriminant Analysis and Statistical Pattern Recognition
,
1992
.
[7]
J. Ross Quinlan,et al.
C4.5: Programs for Machine Learning
,
1992
.
[8]
P. Langley.
Selection of Relevant Features in Machine Learning
,
1994
.
[9]
Ron Kohavi,et al.
Irrelevant Features and the Subset Selection Problem
,
1994,
ICML.