Selection and transformation of input variables for RVM based on MI–PCA–MI and 4‐CBA concentration model

Considering that chemical process is a high nonlinear system with high-dimensional and complex variables, relevance vector machine (RVM) with the novel selection and transformation of the candidate input variables, which combines principal component analysis (PCA) with mutual information (MI), is proposed and named as MI–PCA–MI–RVM. First, to omit the irrelevant input variables, the selected input variables are determined according to their correlation degree with output variable via MI. Second, to eliminate the multicollinearity among the selected input variables, PCA is applied for them to be principal components (PCs). Third, the PCs are reordered according to their correlation degree with output variable via MI and denoted as MI–PCs, and the optimal MI–PCs are determined as the input variables for RVM according to the predicting performance. Further, MI–PCA–MI–RVM is employed to develop the 4-carboxybenzaldehyde concentration model. The result shows that MI–PCA–MI–RVM model has better predicting and robust performance than PCA–RVM and RVM models. © 2012 Curtin University of Technology and John Wiley & Sons, Ltd.

[1]  Michel Verleysen,et al.  Spectrophotometric variable selection by mutual information , 2004 .

[2]  D. W. Scott Averaged Shifted Histograms: Effective Nonparametric Density Estimators in Several Dimensions , 1985 .

[3]  Dejian Lai,et al.  Principal Component Analysis on Human Development Indicators of China , 2003 .

[4]  G. Rangarajan,et al.  Multiple Nonlinear Time Series with Extended Granger Causality , 2004 .

[5]  Lucie Pokorná,et al.  Simultaneous analysis of climatic trends in multiple variables: an example of application of multivariate statistical methods , 2005 .

[6]  Konstantinos I. Diamantaras,et al.  Applying PCA neural models for the blind separation of signals , 2009, Neurocomputing.

[7]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[8]  Amir Wachs,et al.  Improved PCA methods for process disturbance and failure identification , 1999 .

[9]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .

[10]  J. Martínez López,et al.  Multivariate analysis of contamination in the mining district of Linares (Jaén, Spain) , 2008 .

[11]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[12]  Feng Qian,et al.  Development of a kinetic model for industrial oxidation of p‐xylene by RBF‐PLS and CCA , 2004 .

[13]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[14]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[15]  G. Underwood,et al.  GRADIENTS OF CHLOROPHYLL A AND WATER CHEMISTRY ALONG AN EUTROPHIC RESERVOIR WITH DETERMINATION OF THE LIMITING NUTRIENT BY IN SITU NUTRIENT ADDITION , 2000 .

[16]  Holger R. Maier,et al.  Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach , 2009 .

[17]  Menglong Li,et al.  Mutual information-induced interval selection combined with kernel partial least squares for near-infrared spectral calibration. , 2008, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.