Learning weighted metrics to minimize nearest-neighbor classification error

In order to optimize the accuracy of the nearest-neighbor classification rule, a weighted distance is proposed, along with algorithms to automatically learn the corresponding weights. These weights may be specific for each class and feature, for each individual prototype, or for both. The learning algorithms are derived by (approximately) minimizing the leaving-one-out classification error of the given training set. The proposed approach is assessed through a series of experiments with UCI/STATLOG corpora, as well as with a more specific task of text classification which entails very sparse data representation and huge dimensionality. In all these experiments, the proposed approach shows a uniformly good behavior, with results comparable to or better than state-of-the-art results published with the same data so far

[1]  Tony R. Martinez,et al.  Value Difference Metrics for Continuously Valued Attributes , 1996 .

[2]  Dimitrios Gunopulos,et al.  Locally Adaptive Metric Nearest-Neighbor Classification , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  M. Naderi Think globally... , 2004, HIV prevention plus!.

[4]  Robert P. W. Duin,et al.  Linear dimensionality reduction via a heteroscedastic extension of LDA: the Chernoff criterion , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Marcel J. T. Reinders,et al.  Local Fisher embedding , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[6]  Roberto Paredes Palacios Técnicas para la mejora de la clasificación por el vecino más cercano , 2003 .

[7]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[8]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[9]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[10]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[11]  Hermann Ney,et al.  Effect of Feature Smoothing Methods in Text Classification Tasks , 2004, PRIS.

[12]  Anil K. Jain,et al.  Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Jack Koplowitz,et al.  On the relation of performance to editing in nearest neighbor rules , 1981, Pattern Recognit..

[14]  Claire Cardie,et al.  Examining Locally Varying Weights for Nearest Neighbor Algorithms , 1997, ICCBR.

[15]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification and Regression , 1995, NIPS.

[16]  M. Loog,et al.  Local Fisher embedding , 2004, ICPR 2004.

[17]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[18]  Roberto Paredes,et al.  Weighting Prototypes. A New Editing Approach , 2000, ICPR.

[19]  Francesc J. Ferri,et al.  Considerations about sample-size sensitivity of a family of edited nearest-neighbor rules , 1999, IEEE Trans. Syst. Man Cybern. Part B.

[20]  Alfons Juan-Císcar,et al.  Utterance verification using an optimized k-nearest neighbour classifier , 2003, INTERSPEECH.

[21]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[22]  Jing Peng,et al.  Adaptive quasiconformal kernel nearest neighbor classification , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Matti Pietikäinen,et al.  Supervised Locally Linear Embedding , 2003, ICANN.

[24]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Enrique Vidal,et al.  An evaluation of the WPE algorithm using tangent distance , 2002, Object recognition supported by user interaction for service robots.

[26]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[27]  T. Wagner,et al.  Another Look at the Edited Nearest Neighbor Rule. , 1976 .

[28]  G. Lugosi,et al.  On the Strong Universal Consistency of Nearest Neighbor Regression Function Estimates , 1994 .

[29]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[30]  Francesco Ricci,et al.  Data Compression and Local Metrics for Nearest Neighbor Classification , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Enrique Vidal,et al.  A class-dependent weighted dissimilarity measure for nearest neighbor classification problems , 2000, Pattern Recognit. Lett..

[32]  Ron Kohavi,et al.  The Utility of Feature Weighting in Nearest-Neighbor Algorithms , 1997 .

[33]  Enrique Vidal,et al.  Learning prototypes and distances (LPD). A prototype reduction technique based on nearest neighbor error minimization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[34]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[35]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[36]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[37]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[38]  Ivan Tomek,et al.  A Generalization of the k-NN Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[39]  I. Tomek An Experiment with the Edited Nearest-Neighbor Rule , 1976 .

[40]  Enrique Vidal,et al.  Learning prototypes and distances (LPD). A prototype reduction technique based on nearest neighbor error minimization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..