Boosting the Performance of Nearest Neighbour Methods with Feature Selection

This paper describes a Nearest Neighbour procedure for variable selection in function approximation, pattern classification, and time series prediction. Given a training set of input/output vector pairs the procedure identifies a subset of input vector components that effectively capture the input-output relationship implicit in the training set. The utility of this procedure is demonstrated with numerous data sets from the UCI repository of machine learning databases and the Mackey-Glass time series prediction. A comprehensive set of benchmark problems is used to demonstrate comparable performance to that of much more complex boosted C4.5 decision trees.