Collaborative representation based classifier with partial least squares regression for the classification of spectral data

Abstract The need for effective methods to classify high-dimensional spectral data is increasing in tasks such as rapid and non-destructive detection of object features and chemical species using spectroscopy. Partial least squares discriminant analysis (PLS-DA) is an effective, multivariate regression based method for spectral data classification. Although powerful, PLS-DA suffers from performance degradation under complex conditions such as nonlinearity, class imbalance and multiclass, which are common in real-world applications. Collaborative representation-based classifier (CRC) is a new machine learning algorithm which represents a query by a linear combination of training samples and classifies the query based on the representation. It offers the possibility of good classification performance even under nonlinearity, class imbalance and multiclass conditions. In this paper, we present a novel method for spectral data classification, namely CRC-WPLS, which reaps the benefits of both PLS regression and CRC. This method searches for a weighted, linear combination of all training samples to represent the query by using PLS regression, and then assigns the query to the class which yields the least approximation error. CRC-WPLS is compared to PLS-DA, kernel PLS-DA, support vector machine (SVM), random forest (RF) and representation-based classifiers on fourteen general machine learning datasets and three spectral datasets. Experimental results show the proposed method can outperform 7 baseline methods in most cases, and achieve a high classification accuracy (>90%) for low grade spectra obtained from portable instrumentation.

[1]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[2]  H. Wold Nonlinear Iterative Partial Least Squares (NIPALS) Modelling: Some Current Developments , 1973 .

[3]  E. K. Kemsley,et al.  FTIR spectroscopy and multivariate analysis can distinguish the geographic origin of extra virgin olive oils. , 2003, Journal of agricultural and food chemistry.

[4]  Despagne,et al.  Development of a robust calibration model for nonlinear in-line process data , 2000, Analytical chemistry.

[5]  L. Duponchel,et al.  Support vector machines (SVM) in near infrared (NIR) spectroscopy: Focus on parameters optimization and model interpretation , 2009 .

[6]  Manabu Kano,et al.  Estimation of active pharmaceutical ingredients content using locally weighted partial least squares and statistical wavelength selection. , 2011, International journal of pharmaceutics.

[7]  James E. Fowler,et al.  Nearest Regularized Subspace for Hyperspectral Classification , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[8]  Hai-Long Wu,et al.  Variable-weighted PLS , 2007 .

[9]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[10]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[11]  Frans van den Berg,et al.  Review of the most common pre-processing techniques for near-infrared spectra , 2009 .

[12]  L. Buydens,et al.  Opening the kernel of kernel partial least squares and support vector machines. , 2011, Analytica chimica acta.

[13]  Omar Nibouche,et al.  Nearest clusters based partial least squares discriminant analysis for the classification of spectral data. , 2018, Analytica chimica acta.

[14]  Elena Marchiori,et al.  Convolutional neural networks for vibrational spectroscopic data analysis. , 2017, Analytica chimica acta.

[15]  Michel José Anzanello,et al.  Chemometrics and Intelligent Laboratory Systems , 2009 .

[16]  Yibin Ying,et al.  Spectroscopy-based food classification with extreme learning machine , 2014 .

[17]  Evelyne Vigneau,et al.  Random forests: A machine learning methodology to highlight the volatile organic compounds involved in olfactory perception , 2018, Food Quality and Preference.

[18]  Omar Nibouche,et al.  Differentiation of organic and non-organic apples using near infrared reflectance spectroscopy — A pattern recognition approach , 2016, 2016 IEEE SENSORS.

[19]  Richard G. Brereton,et al.  Chemometrics for Pattern Recognition , 2009 .

[20]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[21]  Antonio J. Plaza,et al.  Probabilistic-Kernel Collaborative Representation for Spatial–Spectral Hyperspectral Image Classification , 2016, IEEE Transactions on Geoscience and Remote Sensing.

[22]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  E. K. Kemsley,et al.  Use of Fourier transform infrared spectroscopy and partial least squares regression for the detection of adulteration of strawberry purées , 1998 .

[24]  Hui Wang,et al.  Local Partial Least Square classifier in high dimensionality classification , 2017, Neurocomputing.

[25]  Manabu Kano,et al.  Sparse Sample Regression Based Just-In-Time Modeling (SSR-JIT): Beyond Locally Weighted Approach , 2016 .

[26]  Benjamin Smith,et al.  PRFFECT: a versatile tool for spectroscopists , 2018 .

[27]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Lei Zhang,et al.  Sparse representation or collaborative representation: Which helps face recognition? , 2011, 2011 International Conference on Computer Vision.

[29]  Tianlong Zhang,et al.  Classification of steel samples by laser-induced breakdown spectroscopy and random forest , 2016 .

[30]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[31]  Isabelle Guyon,et al.  Competitive baseline methods set new standards for the NIPS 2003 feature selection benchmark , 2007, Pattern Recognit. Lett..

[32]  Jiangtao Peng,et al.  Maximum correntropy criterion based regression for multivariate calibration , 2017 .

[33]  Ronald D. Snee,et al.  Validation of Regression Models: Methods and Examples , 1977 .

[34]  Saurabh Prasad,et al.  Class-Dependent Sparse Representation Classifier for Robust Hyperspectral Image Classification , 2015, IEEE Transactions on Geoscience and Remote Sensing.

[35]  Lutgarde M. C. Buydens,et al.  Breaking with trends in pre-processing? , 2013 .

[36]  Charles E. Miller,et al.  Sources of Non-Linearity in near Infrared Methods , 1993 .

[37]  A. Höskuldsson PLS regression methods , 1988 .

[38]  Ricard Boqué,et al.  Rapid characterization of transgenic and non-transgenic soybean oils by chemometric methods using NIR spectroscopy. , 2013, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[39]  Márcia M. C. Ferreira,et al.  Computational performance and cross‐validation error precision of five PLS algorithms using designed and real data sets , 2010 .

[40]  Jana Hajslova,et al.  Recognition of beer brand based on multivariate analysis of volatile fingerprint. , 2010, Journal of chromatography. A.

[41]  R. Brereton,et al.  Partial least squares discriminant analysis: taking the magic away , 2014 .