On some transformations of high dimension, low sample size data for nearest neighbor classification

For data with more variables than the sample size, phenomena like concentration of pairwise distances, violation of cluster assumptions and presence of hubness often have adverse effects on the performance of the classic nearest neighbor classifier. To cope with such problems, some dimension reduction techniques like those based on random linear projections and principal component directions have been proposed in the literature. In this article, we construct nonlinear transformations of the data based on inter-point distances, which also lead to reduction in data dimension. More importantly, for such high dimension low sample size data, they enhance separability among the competing classes in the transformed space. When the classic nearest neighbor classifier is used on the transformed data, it usually yields lower misclassification rates. Under appropriate regularity conditions, we derive asymptotic results on misclassification probabilities of nearest neighbor classifiers based on the $$l_2$$l2 norm and the $$l_p$$lp norms (with $$p \in (0,1]$$p∈(0,1]) in the transformed space, when the training sample size remains fixed and the dimension of the data grows to infinity. Strength of the proposed transformations in the classification context is demonstrated by analyzing several simulated and benchmark data sets.

[1]  A. Householder,et al.  Discussion of a set of points in terms of their mutual distances , 1938 .

[2]  Carla E. Brodley,et al.  Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach , 2003, ICML.

[3]  Michel Verleysen,et al.  The Concentration of Fractional Distances , 2007, IEEE Transactions on Knowledge and Data Engineering.

[4]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[5]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[6]  Anil K. Ghosh,et al.  A nonparametric two-sample test applicable to high dimensional data , 2014, J. Multivar. Anal..

[7]  Maya R. Gupta,et al.  Generative models for similarity-based classification , 2008, Pattern Recognit..

[8]  J. S. Marron,et al.  Geometric representation of high dimension, low sample size data , 2005 .

[9]  Anil K. Ghosh,et al.  A distribution-free two-sample run test applicable to high-dimensional data , 2014 .

[10]  Robert P. W. Duin,et al.  A Generalized Kernel Approach to Dissimilarity-based Classification , 2002, J. Mach. Learn. Res..

[11]  Yao-ban Chan,et al.  Robust nearest-neighbor methods for classifying high-dimensional data , 2009 .

[12]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[13]  Henrik Boström,et al.  Reducing High-Dimensional Data by Principal Component Analysis vs. Random Projection for Nearest Neighbor Classification , 2006, 2006 5th International Conference on Machine Learning and Applications (ICMLA'06).

[14]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[15]  R. M. Jong,et al.  Laws of Large Numbers for Dependent Heterogeneous Processes , 1995, Econometric Theory.

[16]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[17]  Chris H. Q. Ding,et al.  On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering , 2005, SDM.

[18]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[19]  Anil K. Ghosh,et al.  ON ERROR-RATE ESTIMATION IN NONPARAMETRIC CLASSIFICATION , 2008 .

[20]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[21]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[22]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[23]  N. JARDINE,et al.  A New Approach to Pattern Recognition , 1971, Nature.

[24]  Maya R. Gupta,et al.  Similarity-based clustering by left-stochastic matrix factorization , 2013, J. Mach. Learn. Res..

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[27]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[28]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[29]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[30]  B. Park,et al.  Choice of neighbor order in nearest-neighbor classification , 2008, 0810.5276.

[31]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[32]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[33]  David G. Stork,et al.  Pattern Classification , 1973 .

[34]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[35]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[36]  S. Salzberg,et al.  A weighted nearest neighbor algorithm for learning with symbolic features , 2004, Machine Learning.

[37]  Dunja Mladenic,et al.  Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification , 2011, International Journal of Machine Learning and Cybernetics.

[38]  W. Issel,et al.  Aho, A. V. / Hopcroft, J. E. / Ullman, J. D., The Design and Analysis of Computer Algorithms. London‐Amsterdam‐Don Mills‐Sydney. Addison‐Wesley Publ. Comp. 1974 X, 470 S., $ 24,– , 1979 .

[39]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[40]  Klaus Obermayer,et al.  Classi cation on Pairwise Proximity , 2007 .

[41]  D. Andrews Laws of Large Numbers for Dependent Non-Identically Distributed Random Variables , 1988, Econometric Theory.