Comparison of variable selection methods for PLS-based soft sensor modeling

Abstract Data-driven soft sensors have been widely used in both academic research and industrial applications for predicting hard-to-measure variables or replacing physical sensors to reduce cost. It has been shown that the performance of these data-driven soft sensors could be greatly improved by selecting only the vital variables that strongly affect the primary variables, rather than using all the available process variables. In this work, a comprehensive evaluation of different variable selection methods for PLS-based soft sensor development is presented, and a new metric is proposed to assess the performance of different variable selection methods. The following seven variable selection methods are compared: stepwise regression (SR), partial least squares with regression coefficients (PLS-BETA), PLS with variable importance in projection (PLS-VIP), uninformative variable elimination with PLS (UVE-PLS), genetic algorithm with PLS (GA-PLS), least absolute shrinkage and selection operator (Lasso), and competitive adaptive reweighted sampling with PLS (CARS-PLS). Their strengths and limitations for soft sensor development are demonstrated by a simulated case study and an industrial case study.

[1]  Lennart Eriksson,et al.  Model validation by permutation tests: Applications to variable selection , 1996 .

[2]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[3]  David Shan-Hill Wong,et al.  Development of adaptive soft sensor based on statistical identification of key variables , 2008 .

[4]  D. Massart,et al.  Elimination of uninformative variables for multivariate calibration. , 1996, Analytical chemistry.

[5]  Ryan Gosselin,et al.  A Bootstrap-VIP approach for selecting wavelength intervals in spectral imaging applications , 2010 .

[6]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[7]  Roman M. Balabin,et al.  Variable selection in near-infrared spectroscopy: benchmarking of feature selection methods on biodiesel data. , 2011, Analytica chimica acta.

[8]  Rasmus Bro,et al.  Variable selection in regression—a tutorial , 2010 .

[9]  R. Boggia,et al.  Genetic algorithms as a strategy for feature selection , 1992 .

[10]  Jin Wang,et al.  Multivariate Statistical Process Monitoring Based on Statistics Pattern Analysis , 2010 .

[11]  C. Jun,et al.  Performance of some variable selection methods when multicollinearity is present , 2005 .

[12]  José Luis Rojo-Álvarez,et al.  Feature selection using support vector machines and bootstrap methods for ventricular fibrillation detection , 2012, Expert Syst. Appl..

[13]  Kang Li,et al.  Variable selection via RIVAL (removing irrelevant variables amidst Lasso iterations) and its application to nuclear material detection , 2012, Autom..

[14]  Pierantonio Facco,et al.  Moving average PLS soft sensor for online product quality estimation in an industrial batch polymerization process , 2009 .

[15]  Bogdan Gabrys,et al.  Review of adaptation mechanisms for data-driven soft sensors , 2011, Comput. Chem. Eng..

[16]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[17]  Pierantonio Facco,et al.  Nearest-Neighbor Method for the Automatic Maintenance of Multivariate Statistical Soft Sensors in Batch Processing , 2010 .

[18]  Jin Wang,et al.  Comparison of the performance of a reduced-order dynamic PLS soft sensor with different updating schemes for digester control , 2012 .

[19]  Jin Wang,et al.  Statistics pattern analysis: A new process monitoring framework and its application to semiconductor batch processes , 2011 .

[20]  Jin Wang,et al.  A reduced order soft sensor approach and its application to a continuous digester , 2011 .