Supervised Distance Preserving Projections: Applications in the quantitative analysis of diesel fuels and light cycle oils from NIR spectra

Abstract In this work, we discuss a recently proposed approach for supervised dimensionality reduction, the Supervised Distance Preserving Projection (SDPP) and, we investigate its applicability to monitoring material's properties from spectroscopic observations. Motivated by continuity preservation, the SDPP is a linear projection method where the proximity relations between points in the low-dimensional subspace mimic the proximity relations between points in the response space. Such a projection facilitates the design of efficient regression models and it may also uncover useful information for visualisation. An experimental evaluation is conducted to show the performance of the SDPP and compare it with a number of state-of-the-art approaches for unsupervised and supervised dimensionality reduction. The regression step after projection is performed using computationally light models with low maintenance cost like Multiple Linear Regression and Locally Linear Regression with k-NN neighbourhoods. For the evaluation, a benchmark and a full-scale calibration problem are discussed. The case studies pertain the estimation of a number of chemico-physical properties in diesel fuels and in light cycle oils, starting from near-infrared spectra. Based on the experimental results, we found that the SDPP leads to parsimonious projections that can be used to design light and yet accurate estimation models.

[1]  Jerry Workman Review of Process and Non-invasive Near-Infrared and Infrared Spectroscopy: 1993-1999 , 1999 .

[2]  Barbara Hammer,et al.  Visualizing the quality of dimensionality reduction , 2013, ESANN.

[3]  Shuo-Yen Robert Li,et al.  Fast Graph Laplacian Regularized Kernel Learning via Semidefinite-Quadratic-Linear Programming , 2009, NIPS.

[4]  M. R. Brito,et al.  Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection , 1997 .

[5]  Zohreh Azimifar,et al.  Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds , 2011, Pattern Recognit..

[6]  Zhanxing Zhu,et al.  Supervised Distance Preserving Projections , 2013, Neural Processing Letters.

[7]  E. Polak,et al.  Note sur la convergence de méthodes de directions conjuguées , 1969 .

[8]  R. Cook,et al.  Dimension Reduction in Binary Response Regression , 1999 .

[9]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[10]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[11]  I. Jolliffe Principal Component Analysis , 2002 .

[12]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[13]  Maya R. Gupta,et al.  Adaptive Local Linear Regression With Application to Printer Color Management , 2008, IEEE Transactions on Image Processing.

[14]  L. Weyer Near-Infrared Spectroscopy of Organic Substances , 1985 .

[15]  Zhanxing Zhu,et al.  Spectroscopic monitoring of diesel fuels using Supervised Distance Preserving Projections , 2013 .

[16]  Jarkko Venna,et al.  Neighborhood Preservation in Nonlinear Projection Methods: An Experimental Study , 2001, ICANN.

[17]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[18]  Zhanxing Zhu,et al.  Local Linear Regression for Soft-Sensor Design with Application to an Industrial Deethanizer , 2011 .

[19]  Stephen P. Boyd,et al.  Semidefinite Programming , 1996, SIAM Rev..

[20]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[21]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[22]  K. Fukumizu,et al.  Gradient-Based Kernel Dimension Reduction for Regression , 2014 .

[23]  O. Wheeler Near Infrared Spectra Of Organic Compounds , 1959 .

[24]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[25]  R. Teófilo,et al.  Sorting variables by using informative vectors as a strategy for feature selection in multivariate regression , 2009 .

[26]  Michel Verleysen,et al.  Quality assessment of dimensionality reduction: Rank-based criteria , 2009, Neurocomputing.

[27]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[28]  J. Alperin Local Representation Theory: Modular Representations as an Introduction to the Local Representation Theory of Finite Groups , 1993 .

[29]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[30]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.