Enhanced predictions of wood properties using hybrid models of PCR and PLS with high-dimensional NIR spectral data

Near infrared (NIR) spectroscopy is a rapid, non-destructive technology to predict a variety of wood properties and provides great opportunities to optimize manufacturing processes through the realization of in-line assessment of forest products. In this paper, a novel multivariate regression procedure, the hybrid model of principal component regression (PCR) and partial least squares (PLS), is proposed to develop more accurate prediction models for high-dimensional NIR spectral data. To integrate the merits of PCR and PLS, both principal components defined in PCR and latent variables in PLS are utilized in hybrid models by a common iterative procedure under the constraint that they should keep orthogonal to each other. In addition, we propose the modified sequential forward floating search method, originated in feature selection for classification problems, in order to overcome difficulties of searching the vast number of possible hybrid models. The effectiveness and efficiency of hybrid models are substantiated by experiments with three real-life datasets of forest products. The proposed hybrid approach can be applied in a wide range of applications with high-dimensional spectral data.

[1]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[2]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Hyun-Woo Cho,et al.  Health monitoring of a shaft transmission system via hybrid models of PCR and PLS , 2006, SDM.

[4]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[5]  Leslie H. Groom,et al.  Near infared spectroscopy in the forest products industry , 2004 .

[6]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[7]  Peter D. Wentzell,et al.  Comparison of principal components regression and partial least squares regression through generic simulations of complex mixtures , 2003 .

[8]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[9]  Leslie H. Groom,et al.  NEAR INFRARED SPECTROSCOPY IN THE FOREST PRODUCTS INDUSTRY , 2004 .

[10]  Josef Kittler,et al.  Floating search methods for feature selection with nonmonotonic criterion functions , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[11]  E. K. Kemsley,et al.  Discriminant analysis of high-dimensional data: a comparison of principal components analysis and partial least squares data reduction methods , 1996 .

[12]  Myong K. Jeong,et al.  WOOD SHRINKAGE PREDICTION USING NIR SPECTROSCOPY , 2008 .

[13]  M. Stone Asymptotics for and against cross-validation , 1977 .

[14]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[15]  Evelyne Vigneau,et al.  Application of latent root regression for calibration in near-infrared spectroscopy. Comparison with principal component regression and partial least squares , 1996 .

[16]  Hyun-Woo Cho,et al.  Enhanced discrimination and calibration of biomass NIR spectral data using non-linear kernel methods. , 2008, Bioresource technology.