Local Partial Least Square classifier in high dimensionality classification

A central idea in distance-based machine learning algorithms such k-nearest neighbors and manifold learning is to choose a set of references, or a neighborhood, based on a distance functions to represent the local structure around a query point and use the local structures as the basis to construct models. Local Partial Least Square (local PLS), which is the result of applying this neighborhood based idea in Partial Least Square (PLS), has been shown to perform very well on the regression of small-sample sized and multicollinearity data, but seldom used in high-dimensionality classification. Furthermore the difference between PLS and local PLS with respect to their optimal intrinsic dimensions is unclear. In this paper we combine local PLS with non-Euclidean distance in order to find out which measures are better suited for high dimensionality classification. Experimental results obtained on 8 UCI and spectroscopy datasets show that the Euclidean distance is not a good distance function for use in local PLS classification, especially in high dimensionality cases; instead Manhattan distance and fractional distance are preferred. Experimental results further show that the optimal intrinsic dimension of local PLS is smaller than that of the standard PLS.

[1]  Charu C. Aggarwal,et al.  On the Surprising Behavior of Distance Metrics in High Dimensional Spaces , 2001, ICDT.

[2]  Li Zhuo,et al.  A comparative study of dimensionality reduction methods for large-scale image retrieval , 2014, Neurocomputing.

[3]  Manabu Kano,et al.  Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection. , 2011, International journal of pharmaceutics.

[4]  T. Fearn,et al.  Bayesian Wavelet Regression on Curves With Application to a Spectroscopic Calibration Problem , 2001 .

[5]  Chng Eng Siong,et al.  Local partial least square regression for spectral mapping in voice conversion , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[6]  Bing Wang,et al.  An optimal peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using mixture similarity measure , 2011, Bioinform..

[7]  Panos Panagos,et al.  Prediction of soil organic carbon content by diffuse reflectance spectroscopy using a local partial least square regression approach , 2014 .

[8]  Manabu Kano,et al.  Optimum quality design system for steel products through locally weighted regression model , 2011 .

[9]  Federico Marini,et al.  Local classification: Locally weighted-partial least squares-discriminant analysis (LW-PLS-DA). , 2014, Analytica chimica acta.

[10]  D. Ballabio,et al.  Classification tools in chemistry. Part 1: linear models. PLS-DA , 2013 .

[11]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[12]  Manabu Kano,et al.  Covariance-based Locally Weighted Partial Least Squares for High- Performance Adaptive Modeling , 2015 .

[13]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[14]  Manabu Kano,et al.  Evaluation of infrared-reflection absorption spectroscopy measurement and locally weighted partial least-squares for rapid analysis of residual drug substances in cleaning processes. , 2012, Analytical chemistry.

[15]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[16]  Xiang Zhang,et al.  Comparative Analysis of Mass Spectral Similarity Measures on Peak Alignment for Comprehensive Two-Dimensional Gas Chromatography Mass Spectrometry , 2013, Comput. Math. Methods Medicine.

[17]  R. Boqué,et al.  Calculation of the reliability of classification in discriminant partial least-squares binary classification , 2009 .

[18]  Tamara Munzner,et al.  Dimensionality reduction for documents with nearest neighbor queries , 2015, Neurocomputing.

[19]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[20]  T. Fearn,et al.  Application of near infrared reflectance spectroscopy to the compositional analysis of biscuits and biscuit doughs , 1984 .

[21]  J. Tukey,et al.  Variations of Box Plots , 1978 .

[22]  Léon Bottou,et al.  Local Learning Algorithms , 1992, Neural Computation.

[23]  Feng Qian,et al.  Online updating of NIR model and its industrial application via adaptive wavelength selection and local regression strategy , 2014 .

[24]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[25]  K. Chien,et al.  Partial least squares analysis of the association between metabolic factors and left ventricular mass among Taiwanese adolescents. , 2011, International journal of cardiology.

[26]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[27]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[28]  Masashi Sugiyama,et al.  Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis , 2007, J. Mach. Learn. Res..

[29]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[30]  Guo-Zheng Li,et al.  An asymmetric classifier based on partial least squares , 2010, Pattern Recognit..

[31]  Sergio A. Velastin,et al.  Local Fisher Discriminant Analysis for Pedestrian Re-identification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.