Multiresolution interval partial least squares: A framework for waveband selection and resolution optimization

Abstract Spectroscopic data generated by several PAT technologies is routinely used for the rapid assessment of quality properties in several industrial sectors, such as agrofood, beverages, pharmaceutics, chemicals, pulp & paper, etc. While spectra can easily provide hundreds of measurements across several wavelengths, only a fraction of the collected spectrum conveys relevant information to predict the property of interest. Therefore, the performance of current models is highly related with the ability to select key wavebands, for which the existence of prior knowledge cannot be always secured. Therefore, several feature selection procedures consisting of variants of interval partial least squares (iPLS) have been proposed. These methodologies are however solely focused on determining the most relevant wavebands and do not attempt to further enhance the prediction capabilities within each interval. On the other hand, standard full-spectrum models are often improved by reducing the spectral resolution, but this operation has not been yet synergistically integrated together with waveband selection. As spectral aggregation can effectively improve modelling performance, a multiresolution selection algorithm that simultaneously selects the most relevant wavebands and their optimal resolution is here proposed. By design, this methodology leads to prediction models that are at least as good as the full-spectrum models. The performance comparison made on simulated data and real NIR spectra of gasoline samples also shows that the proposed methodology outperforms iPLS and its variants based on forward and backward selection of intervals in a statistically significant way.

[1]  Jiewen Zhao,et al.  Selection of the efficient wavelength regions in FT-NIR spectroscopy for determination of SSC of ‘Fuji’ apple based on BiPLS and FiPLS models , 2007 .

[2]  Roberto Kawakami Harrop Galvão,et al.  Optimal wavelet filter construction using X and Y data , 2004 .

[3]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[4]  Keith D. Shepherd,et al.  Infrared Spectroscopy—Enabling an Evidence-Based Diagnostic Surveillance Approach to Agricultural and Environmental Management in Developing Countries , 2007 .

[5]  A. Höskuldsson PLS regression methods , 1988 .

[6]  J. E. Jackson A User's Guide to Principal Components , 1991 .

[7]  Renfu Lu,et al.  Detection of bruises on apples using near-infrared hyperspectral imaging , 2003 .

[8]  Anders Björk,et al.  Spectra of wavelet scale coefficients from process acoustic measurements as input for PLS modelling of pulp quality , 2002 .

[9]  Ronald R. Coifman,et al.  The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration , 2005 .

[10]  A. Murugesan,et al.  Production and analysis of bio-diesel from non-edible oils-A review , 2009 .

[11]  K. Jetter,et al.  Quantitative analysis of near infrared spectra by wavelet coefficient regression using a genetic algorithm , 1999 .

[12]  C. Spiegelman,et al.  Theoretical Justification of Wavelength Selection in PLS Calibration:  Development of a New Algorithm. , 1998, Analytical Chemistry.

[13]  N. Draper,et al.  Applied Regression Analysis , 1967 .

[14]  Josse De Baerdemaeker,et al.  A review of the analytical methods coupled with chemometric tools for the determination of the quality and identity of dairy products , 2007 .

[15]  J. Kalivas Two data sets of near infrared spectra , 1997 .

[16]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[17]  D. Kell,et al.  Variable selection in wavelet regression models , 1998 .

[18]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[19]  Zou Xiaobo,et al.  Variables selection methods in near-infrared spectroscopy. , 2010, Analytica chimica acta.

[20]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[21]  K. Jetter,et al.  The fast wavelet transform on compact intervals as a tool in chemometrics: II. Boundary effects, denoising and compression , 1999 .

[22]  Y. Roggo,et al.  A review of near infrared spectroscopy and chemometrics in pharmaceutical technologies. , 2007, Journal of pharmaceutical and biomedical analysis.

[23]  Douglas A. Wolfe,et al.  Nonparametric Statistical Methods , 1973 .

[24]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[25]  Tiago J. Rato,et al.  Building Optimal Multiresolution Soft Sensors for Continuous Processes , 2018 .

[26]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[27]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[28]  C. Gendrin,et al.  Monitoring galenical process development by near infrared chemical imaging: one case study. , 2008, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[29]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[30]  Geert Gins,et al.  Finding the optimal time resolution for batch-end quality prediction: MRQP – A framework for multi-resolution quality prediction , 2018 .

[31]  Jian-hui Jiang,et al.  MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration , 2007 .

[32]  A. Höskuldsson Variable and subset selection in PLS regression , 2001 .

[33]  L. C. Meher,et al.  Technical aspects of biodiesel production by transesterification—a review , 2006 .

[34]  Pedro M Saraiva,et al.  Development of a fast and reliable method for long- and short-term wine age prediction. , 2011, Talanta.

[35]  D I McLean,et al.  Rapid near-infrared Raman spectroscopy system for real-time in vivo skin measurements. , 2001, Optics letters.

[36]  Rasmus Bro,et al.  Exploring the phenotypic expression of a regulatory proteome-altering gene by spectroscopy and chemometrics , 2001 .

[37]  Vincent Leemans,et al.  Selection of the most efficient wavelength bands for ‘Jonagold’ apple sorting , 2003 .

[38]  D. Kell,et al.  An introduction to wavelet transforms for chemometricians: A time-frequency approach , 1997 .

[39]  I. Jolliffe Principal Component Analysis , 2002 .

[40]  Erik Dahlquist,et al.  Methods for determination of moisture content in woodchips for power plants—a review , 2004 .

[41]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.