Transforming input variables for RBFN based on PCA-ASH multivariate correlation analysis and its application

The mutual information (MI) based on averaged shifted histogram (ASH) probability density estimator is considered as a good indicator of relevance between input variables and output variable. However, it cannot deal with redundant input variables problem. Therefore, a method integrates principal component analysis (PCA) with MI is proposed for radial basis function network (RBFN) to improve the predicting performance of RBFN. Firstly, PCA is employed to characterize the PCs from original variables, among which there is non-correlation. Secondly, MI based on ASH is applied to select the several closest correlation PCs with output variable as the new input variables. Finally, PCA-ASH-RBFN is employed to develop the housing price model based on the Boston housing data set. The result shows that PCA-ASH-RBFN has better prediction and robust performance than PCA-RBFN and RBFN integrating with robust feature selection for input variables.

[1]  Edmund R. Malinowski,et al.  Factor Analysis in Chemistry , 1980 .

[2]  Dejian Lai,et al.  Principal Component Analysis on Human Development Indicators of China , 2003 .

[3]  Konstantinos I. Diamantaras,et al.  Applying PCA neural models for the blind separation of signals , 2009, Neurocomputing.

[4]  Mehmet Yuceer Artificial neural network models for HFCS isomerization process , 2010, Neural Computing and Applications.

[5]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[6]  L. G. Blackwood Factor Analysis in Chemistry (2nd Ed.) , 1994 .

[7]  Michel Verleysen,et al.  Resampling methods for parameter-free and robust feature selection with mutual information , 2007, Neurocomputing.

[8]  G. Underwood,et al.  GRADIENTS OF CHLOROPHYLL A AND WATER CHEMISTRY ALONG AN EUTROPHIC RESERVOIR WITH DETERMINATION OF THE LIMITING NUTRIENT BY IN SITU NUTRIENT ADDITION , 2000 .

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Michel Verleysen,et al.  Mutual information for the selection of relevant variables in spectrometric nonlinear modelling , 2006, ArXiv.

[11]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[12]  Holger R. Maier,et al.  Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach , 2009 .

[13]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[14]  Lucie Pokorná,et al.  Simultaneous analysis of climatic trends in multiple variables: an example of application of multivariate statistical methods , 2005 .

[15]  I. Jolliffe Principal Component Analysis , 2002 .

[16]  Amir Wachs,et al.  Improved PCA methods for process disturbance and failure identification , 1999 .

[17]  J. Martínez López,et al.  Multivariate analysis of contamination in the mining district of Linares (Jaén, Spain) , 2008 .

[18]  D. W. Scott,et al.  Biased and Unbiased Cross-Validation in Density Estimation , 1987 .

[19]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[20]  Chong-Ho Choi,et al.  Improved mutual information feature selector for neural networks in supervised learning , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[21]  Multivariate Data Processing of Spectral Images: The Ugly, the Bad, and the True , 2007 .

[22]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[23]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[24]  D. W. Scott Averaged Shifted Histograms: Effective Nonparametric Density Estimators in Several Dimensions , 1985 .

[25]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[26]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[27]  J. Mason,et al.  Algorithms for approximation , 1987 .

[28]  G. Rangarajan,et al.  Multiple Nonlinear Time Series with Extended Granger Causality , 2004 .

[29]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[30]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[31]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[32]  Christophe Croux,et al.  A Fast Algorithm for Robust Principal Components Based on Projection Pursuit , 1996 .