Feature selection based on concurrent projection to latent structures for high dimensional spectra data

Dimension reduction is one of the important issues for soft sensor model construction based on data-driven method; especially for the high dimensional spectra data. To select effective input features with clear physical interpretation and simple model structure is very necessary. In this paper, a new feature selection method based on concurrent projection to latent structures (CPLS) algorithm is proposed to select important input features for modeling spectra data effectively. The proposed method selects input features based on un-scaled data in terms of special characteristics of spectra data. A new modified simple sphere criterion (SSC) is used to select the input features relative to output data with only one latent variable. The final soft sensor model is constructed with the scaled selected data by using PLS algorithm. Near-infrared (NIR) spectra and mechanical frequency spectra data are used to validate the proposed method. Simulation results show that the proposed approach has better generalization performance than that of the former proposed PLS-based one.

[1]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Donghua Zhou,et al.  Generalized Reconstruction-Based Contributions for Output-Relevant Fault Diagnosis With Application to the Tennessee Eastman Process , 2011, IEEE Transactions on Control Systems Technology.

[3]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[4]  H. Yue,et al.  Fault detection of plasma etchers using optical emission spectra , 2000 .

[5]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[6]  Tianyou Chai,et al.  Feature Selection of Frequency Spectrum for Modeling Difficulty to Measure Process Parameters , 2012, ISNN.

[7]  S. Wold,et al.  Orthogonal signal correction of near-infrared spectra , 1998 .

[8]  Wen Yu,et al.  Selective ensemble modeling load parameters of ball mill based on multi-scale frequency spectral features and sphere criterion , 2016 .

[9]  S. Qin,et al.  Output Relevant Fault Reconstruction and Fault Subspace Extraction in Total Projection to Latent Structures Models , 2010 .

[10]  S. Joe Qin,et al.  Quality‐relevant and process‐relevant fault monitoring with concurrent projection to latent structures , 2013 .

[11]  Keinosuke Fukunaga,et al.  Effects of Sample Size in Classifier Design , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[13]  John F. MacGregor,et al.  Process monitoring and diagnosis by multiblock PLS methods , 1994 .

[14]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[15]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[16]  Donghua Zhou,et al.  Geometric properties of partial least squares for process monitoring , 2010, Autom..

[17]  Donghua Zhou,et al.  Total projection to latent structures for process monitoring , 2009 .

[18]  Tianyou Chai,et al.  Modeling Load Parameters of Ball Mill in Grinding Process Based on Selective Ensemble Multisensor Information , 2013, IEEE Transactions on Automation Science and Engineering.

[19]  Si-Zhao Joe Qin,et al.  Survey on data-driven industrial process monitoring and diagnosis , 2012, Annu. Rev. Control..

[20]  Bhupinder S. Dayal,et al.  Improved PLS algorithms , 1997 .

[21]  Bogdan Gabrys,et al.  Data-driven Soft Sensors in the process industry , 2009, Comput. Chem. Eng..

[22]  S. Qin Recursive PLS algorithms for adaptive data modeling , 1998 .