A simple ensemble strategy of uninformative variable elimination and partial least-squares for near-infrared spectroscopic calibration of pharmaceutical products

Abstract Based on uninformative variable elimination (UVE) and partial least squares (PLS), a simple ensemble strategy, named EUVEPLS, is proposed for multivariate calibration of near-infrared spectroscopy. In such a strategy, different calibration sets are first formed by randomly selecting a fixed number of objects from the available calibration data, with replacement, in this manner, different PLS models can be obtained, as in the jackknife procedure. Then, the UVE is used to shrink the original variable space into a specific subspace, in which a PLS model with the optimal number of latent variables is determined by cross-validation. This process is repeated until a fixed number of candidate models, and therefore the same number of subspaces, are obtained. Finally, those models with better performance are used to produce an ensemble model by simple/weighted average. Two NIR spectral datasets concerning the pharmaceutical products are used to verify the accuracy and robustness of the proposed EUVEPLS. The results confirmed the superiority of EUVEPLS to two single model reference methods (UVEPLS and full-spectrum PLS).

[1]  R. Yu,et al.  Variable-weighted least-squares support vector machine for multivariate spectral analysis. , 2010, Talanta.

[2]  M Gishen,et al.  A feasibility study on the use of visible and short wavelengths in the near-infrared region for the non-destructive measurement of wine composition , 2007, Analytical and bioanalytical chemistry.

[3]  T. Herkert,et al.  One hundred percent online identity check of pharmaceutical products by near-infrared spectroscopy on the packaging line. , 2001, European journal of pharmaceutics and biopharmaceutics : official journal of Arbeitsgemeinschaft fur Pharmazeutische Verfahrenstechnik e.V.

[4]  Faber Improved computation of the standard error in the regression coefficient estimates of a multivariate calibration model , 2000, Analytical chemistry.

[5]  Michel Ulmschneider,et al.  Combined wavelet transform-artificial neural network use in tablet active content determination by near-infrared spectroscopy. , 2007, Analytica chimica acta.

[6]  S. Wold,et al.  Wavelength interval selection in multicomponent spectral analysis by moving window partial least-squares regression with applications to mid-infrared and near-infrared spectroscopic data. , 2002, Analytical chemistry.

[7]  R. J. Barnes,et al.  Correction to the Description of Standard Normal Variate (SNV) and De-Trend (DT) Transformations in Practical Spectroscopy with Applications in Food and Beverage Analysis—2nd Edition , 1993 .

[8]  Xueguang Shao,et al.  Multivariate calibration methods in near infrared spectroscopic analysis , 2010 .

[9]  R. V. Rossel,et al.  Robust Modelling of Soil Diffuse Reflectance Spectra by “Bagging-Partial Least Squares Regression” , 2007 .

[10]  D Lochmann,et al.  Tablet identification using near-infrared spectroscopy (NIRS) for pharmaceutical quality control. , 2008, Journal of pharmaceutical and biomedical analysis.

[11]  Peter Tiño,et al.  Managing Diversity in Regression Ensembles , 2005, J. Mach. Learn. Res..

[12]  Menglong Li,et al.  Random Subspace Regression Ensemble for Near-Infrared Spectroscopic Calibration of Tobacco Samples , 2008, Analytical sciences : the international journal of the Japan Society for Analytical Chemistry.

[13]  Yukihiro Ozaki,et al.  Investigations of bagged kernel partial least squares (KPLS) and boosting KPLS with applications to near‐infrared (NIR) spectra , 2006 .

[14]  Meng-long Li,et al.  An ensemble method based on uninformative variable elimination and mutual information for spectral multivariate calibration. , 2010, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[15]  Y. Roggo,et al.  A review of near infrared spectroscopy and chemometrics in pharmaceutical technologies. , 2007, Journal of pharmaceutical and biomedical analysis.

[16]  Xin Qin,et al.  Study of the feasibility of distinguishing cigarettes of different brands using an Adaboost algorithm and near-infrared spectroscopy , 2007, Analytical and bioanalytical chemistry.

[17]  Bing Zhao,et al.  Quantitative analysis of routine chemical constituents in tobacco by near-infrared spectroscopy and support vector machine. , 2008, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[18]  Yong He,et al.  Theory and application of near infrared reflectance spectroscopy in determination of food quality , 2007 .

[19]  D L Massart,et al.  Boosting partial least squares. , 2005, Analytical chemistry.

[20]  M. Dyrby,et al.  Chemometric Quantitation of the Active Substance (Containing C≡N) in a Pharmaceutical Tablet Using Near-Infrared (NIR) Transmittance and NIR FT-Raman Spectra , 2002 .

[21]  J. Rantanen,et al.  Use of in-line near-infrared spectroscopy in combination with chemometrics for improved understanding of pharmaceutical processes. , 2005, Analytical chemistry.

[22]  F. Melgani,et al.  Multiple regression systems for spectrophotometric data analysis , 2009 .

[23]  Mykola Pechenizkiy,et al.  Diversity in search strategies for ensemble feature selection , 2005, Inf. Fusion.

[24]  Marcelo Nascimento Martins,et al.  An application of subagging for the improvement of prediction accuracy of multivariate calibration models , 2006 .

[25]  R. Yu,et al.  An ensemble of Monte Carlo uninformative variable elimination for wavelength selection. , 2008, Analytica chimica acta.

[26]  S. Garrigues,et al.  Near-infrared diffuse reflectance spectroscopy and neural networks for measuring nutritional parameters in chocolate samples. , 2007, Analytica chimica acta.

[27]  R. Poppi,et al.  Least-squares support vector machines and near infrared spectroscopy for quantification of common adulterants in powdered milk. , 2006, Analytica chimica acta.

[28]  Menglong Li,et al.  Subspace Regression Ensemble Method Based on Variable Clustering for Near-Infrared Spectroscopic Calibration , 2009 .

[29]  Roberto Kawakami Harrop Galvão,et al.  A method for calibration and validation subset partitioning. , 2005, Talanta.

[30]  P. A. Gorry General least-squares smoothing and differentiation by the convolution (Savitzky-Golay) method , 1990 .

[31]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[32]  Fang Wang,et al.  A method for near-infrared spectral calibration of complex plant samples with wavelet transform and elimination of uninformative variables , 2004, Analytical and bioanalytical chemistry.

[33]  Fei Liu,et al.  Comparison of calibrations for the determination of soluble solids content and pH of rice vinegars using visible and short-wave near infrared spectroscopy. , 2008, Analytica chimica acta.

[34]  D. Massart,et al.  Near-infrared spectroscopy applications in pharmaceutical analysis. , 2007, Talanta.

[35]  B. Ludwig,et al.  Use of near- and mid-infrared spectroscopy to distinguish carbon and nitrogen originating from char and forest-floor material in soils , 2009 .

[36]  Meng-long Li,et al.  Comparison of chemometric methods for brand classification of cigarettes by near-infrared spectroscopy , 2009 .

[37]  Pierre Dardenne,et al.  Validation and verification of regression in small data sets , 1998 .

[38]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[39]  Weida Tong,et al.  A Partial Least Squares‐Based Consensus Regression Method for the Analysis of Near‐Infrared Complex Spectral Data of Plant Samples , 2006 .

[40]  A. Höskuldsson PLS regression methods , 1988 .