Proximity-Graph Instance-Based Learning, Support Vector Machines, and High Dimensionality: An Empirical Comparison

Previous experiments with low dimensional data sets have shown that Gabriel graph methods for instance-based learning are among the best machine learning algorithms for pattern classification applications. However, as the dimensionality of the data grows large, all data points in the training set tend to become Gabriel neighbors of each other, bringing the efficacy of this method into question. Indeed, it has been conjectured that for high-dimensional data, proximity graph methods that use sparser graphs, such as relative neighbor graphs (RNG) and minimum spanning trees (MST) would have to be employed in order to maintain their privileged status. Here the performance of proximity graph methods, in instance-based learning, that employ Gabriel graphs, relative neighborhood graphs, and minimum spanning trees, are compared experimentally on high-dimensional data sets. These methods are also compared empirically against the traditional k-NN rule and support vector machines (SVMs), the leading competitors of proximity graph methods.

[1]  Godfried T. Toussaint,et al.  The relative neighbourhood graph of a finite planar set , 1980, Pattern Recognit..

[2]  Filiberto Pla,et al.  Prototype selection for the nearest neighbour rule through proximity graphs , 1997, Pattern Recognit. Lett..

[3]  D. Kirkpatrick,et al.  A Framework for Computational Morphology , 1985 .

[4]  S. Sathiya Keerthi,et al.  Which Is the Best Multiclass SVM Method? An Empirical Study , 2005, Multiple Classifier Systems.

[5]  Tony R. Martinez,et al.  Reduction Techniques for Instance-Based Learning Algorithms , 2000, Machine Learning.

[6]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[7]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[8]  C. G. Hilborn,et al.  The Condensed Nearest Neighbor Rule , 1967 .

[10]  L. Devroye THE EXPECTED SIZE OF SOME GRAPHS IN COMPUTATIONAL GEOMETRY , 1988 .

[11]  Godfried T. Toussaint,et al.  Geometric proximity graphs for improving nearest neighbor methods in instance-based learning and data mining , 2005, Int. J. Comput. Geom. Appl..

[12]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[15]  Godfried Toussaint,et al.  Geometric Decision Rules for High Dimensions , .

[16]  Godfried T. Toussaint,et al.  Some new algorithms and software implementation methods for pattern recognition research , 1979, COMPSAC.

[17]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[18]  Luc Devroye,et al.  On the Inequality of Cover and Hart in Nearest Neighbor Discrimination , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Emilia Gómez,et al.  Comparative Analysis of Music Recordings from Western and Non-Western traditions by Automatic Tonal Feature Extraction , 2008 .

[20]  R S Poulsen,et al.  Estimating false positive and false negative error rates in cervical cell classification. , 1977, The journal of histochemistry and cytochemistry : official journal of the Histochemistry Society.

[21]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[22]  Peter E. Hart,et al.  The condensed nearest neighbor rule (Corresp.) , 1968, IEEE Trans. Inf. Theory.

[23]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[24]  Giri Narasimhan,et al.  Experiments with Computing Geometric Minimum Spanning Trees , 2000 .

[25]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[26]  Irwin King,et al.  A study of the relationship between support vector machine and Gabriel graph , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[27]  Godfried T. Toussaint,et al.  Geometric Decision Rules for Instance-Based Learning Problems , 2005, PReMI.

[28]  Chris Mellish,et al.  Advances in Instance Selection for Instance-Based Learning Algorithms , 2002, Data Mining and Knowledge Discovery.

[29]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[30]  Godfried T. Toussaint,et al.  Relative neighborhood graphs and their relatives , 1992, Proc. IEEE.