A Comparison of Variable Selection Methods with the Main Focus on Orthogonalization

When multidimensional data has to be approximated, linear regression models are widely used. If the number of input space dimensions is very high, dimension reduction methods have to be applied for several reasons. Variable selection (in the literature also called subset selection) is a special kind of dimension reduction as it tries to find the smallest set of regressors that approximates a given independent best. Several methods for selecting these regressor variables exist, including forward selection, backward elimination and stepwise regression — see [7] and [8]. In this paper, we assume that the number of regressor candidates is very high, and the number of variables in the finally used linear regression model is quite low. Under these circumstances, several methods cannot be used because they result in a too high computational effort. Therefore, variants of forward selection are preferred because they are the fastest variable selection methods. Especially two variants of forward selection with orthogonalization are investigated, because orthogonalization improves the quality of the result.

[1]  Dimitar Filev,et al.  Generation of Fuzzy Rules by Mountain Clustering , 1994, J. Intell. Fuzzy Syst..

[2]  O. Nelles Nonlinear System Identification , 2001 .

[3]  E. Lughofer,et al.  Filtering of dynamic measurements in intelligent sensors for fault detection based on data-driven models , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[4]  Stephen L. Chiu,et al.  Fuzzy Model Identification Based on Cluster Estimation , 1994, J. Intell. Fuzzy Syst..

[5]  E. Lughofer,et al.  Model-based fault detection in multi-sensor measurement systems , 2004, 2004 2nd International IEEE Conference on 'Intelligent Systems'. Proceedings (IEEE Cat. No.04EX791).

[6]  Lawrence O. Hall,et al.  An investigation of mountain method clustering for large data sets , 1997, Pattern Recognit..