Flexible Metric Nearest Neighbor Classification

The K-nearest-neighbor decision rule assigns an object of unknown class to the plurality class among the K labeled \training" objects that are closest to it. Closeness is usually de ̄ned in terms of a metric distance on the Euclidean space with the input measurement variables as axes. The metric chosen to de ̄ne this distance can strongly e®ect performance. An optimal choice depends on the problem at hand as characterized by the respective class distributions on the input measurement space, and within a given problem, on the location of the unknown object in that space. In this paper new types of K-nearest-neighbor procedures are described that estimate the local relevance of each input variable, or their linear combinations, for each individual point to be classi ̄ed. This information is then used to separately customize the metric used to de ̄ne distance from that object in ̄nding its nearest neighbors. These procedures are a hybrid between regular K-nearest-neighbor methods and tree-structured recursive partitioning techniques popular in statistics and machine learning.