A data-driven near infrared calibration process including near infrared spectral thumbprints

Near infrared spectra are highly correlated, complex and noisy, and potentially have many more predictor variables than are required to estimate a parsimonious calibration equation. It is difficult to appreciate the implication of pre-processing choices that are made during calibration, especially in connection with the relationship between the transformed data and the reference values. Graphical methods can be used to understand these relationships better and decisions made during the calibration process can be based on the data alone. In this paper, new graphical tools are introduced to help the researcher better understand these complex relationships in the data. When combined with the proposed algorithm to explore spectra in relation to calibration, these tools enable a parsimonious calibration model to be formed. The results from two different (diesel and wheat) near infrared spectra show that it is possible to form successful calibration equations based on the proposed algorithm, which includes the two new graphical tools. There is a high level of correlation between the results of the different transformations considered, suggesting that in terms of parsimony, developing a calibration using the raw spectra could provide the most judicious outcome.

[1]  P. Geladi,et al.  Linearization and Scatter-Correction for Near-Infrared Reflectance Spectra of Meat , 1985 .

[2]  J. Reid Experimental Design and Data Analysis for Biologists , 2003 .

[3]  D. Massart,et al.  The influence of data pre-processing in the pattern recognition of excipients near-infrared spectra. , 1999, Journal of Pharmaceutical and Biomedical Analysis.

[4]  Tomasz Opach,et al.  Cartographic Visualization of Vulnerability to Natural Hazards , 2013, Cartogr. Int. J. Geogr. Inf. Geovisualization.

[6]  Rolf Ergon,et al.  Informative PLS score-loading plots for process understanding , 2004 .

[7]  G. Batten Plant analysis using near infrared reflectance spectroscopy : the potential and the limitations , 1998 .

[8]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[9]  Jaap Heringa,et al.  FluxSimulator: An R Package to Simulate Isotopomer Distributions in Metabolic Networks , 2007 .

[10]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[11]  I. Noda,et al.  Determination of Two-Dimensional Correlation Spectra Using the Hilbert Transform , 2000 .

[12]  A. E. Dowrey,et al.  Generalized Two-Dimensional Correlation Spectroscopy , 2000 .

[13]  V. Barnett,et al.  Applied Linear Statistical Models , 1975 .

[14]  Achim Kohler,et al.  Optimizing Savitzky–Golay Parameters for Improving Spectral Resolution and Quantification in Infrared Spectroscopy , 2013, Applied spectroscopy.

[15]  Tom Fearn,et al.  Comparison of partial least squares regression, least squares support vector machines, and Gaussian process regression for a near infrared calibration , 2017 .

[16]  Roger Mead,et al.  Statistical methods in agriculture and experimental biology , 1983 .

[17]  Isao Noda,et al.  Novel developments and applications of two-dimensional correlation spectroscopy , 2016 .

[18]  Harald Martens,et al.  Reliable and relevant modelling of real world data: a personal account of the development of PLS Regression , 2001 .

[19]  Emil W. Ciurczak,et al.  Handbook of Near-Infrared Analysis , 1992 .

[20]  Rolf Ergon,et al.  Informative PLS score-loading plots for process understanding and monitoring , 2005 .

[21]  D. Stott Parker,et al.  Neuroimaging Study Designs, Computational Analyses and Data Provenance Using the LONI Pipeline , 2010, PloS one.

[22]  Steven D. Brown Introduction to Multivariate Statistical Analysis in Chemometrics , 2010 .

[23]  R. Barnes,et al.  Standard Normal Variate Transformation and De-Trending of Near-Infrared Diffuse Reflectance Spectra , 1989 .

[24]  Subhash Sharma Applied multivariate techniques , 1995 .

[25]  J. Roger,et al.  Curve fitting in Fourier transform near infrared spectroscopy used for the analysis of bacterial cells , 2017 .

[26]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[27]  M. C. Ortiz,et al.  A useful tool for computation and interpretation of trading-off solutions through pareto-optimal front in the field of experimental designs for mixtures , 2016 .

[28]  Shazia Sultana,et al.  Practical Handbook on Biodiesel Production and Properties , 2012 .

[29]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[30]  R. R. Ernst,et al.  Two-dimensional nuclear magnetic resonance spectroscopy , 2011 .

[31]  Paul Murrell,et al.  R Graphics , 2006, Computer science and data analysis series.

[32]  Paul S. Heckbert Nice numbers for graph labels , 1990 .

[33]  Xin Zhao,et al.  Structure revealing techniques based on parallel coordinates plot , 2012, The Visual Computer.

[34]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[35]  Frans van den Berg,et al.  Review of the most common pre-processing techniques for near-infrared spectra , 2009 .