On Multicollinearity and Concurvity in Some Nonlinear Multivariate Models

Recent developments of multivariate smoothing methods provide a rich collection of feasible models for nonparametric multivariate data analysis. Among the most interpretable are those with smoothed additive terms. Construction of various methods and algorithms for computing the models have been the main concern in literature in this area. Less results are available on the validation of computed fit, instead, and many applications of nonparametric methods end up in computing and comparing the generalized validation error or related indexes. This article reviews the behaviour of some of the best known multivariate nonparametric methods, based on subset selection and on projection, when (exact) collinearity or multicollinearity (near collinearity) is present in the input matrix. It shows the possible aliasing effects in computed fits of some selection methods and explores the properties of the projection spaces reached by projection methods in order to help data analysts to select the best model in case of ill conditioned input matrices. Two simulation studies and a real data set application are presented to illustrate further the effects of collinearity or multicollinearity in the fit.

[1]  Chong Gu Diagnostics for Nonparametric Regression Models with Additive Terms , 1992 .

[2]  G. Stewart Collinearity and Least Squares Regression , 1987 .

[3]  Peter C. Cheeseman,et al.  Selecting models from data , 1994, Lecture notes in statistics.

[4]  Michel Verleysen,et al.  Enhanced learning for evolutive neural architectures , 1995 .

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  Brian D. Ripley,et al.  Pattern Recognition and Neural Networks , 1996 .

[7]  William N. Venables,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[8]  Huan Liu,et al.  Book review: Machine Learning, Neural and Statistical Classification Edited by D. Michie, D.J. Spiegelhalter and C.C. Taylor (Ellis Horwood Limited, 1994) , 1996, SGAR.

[9]  Allan R. Wilks,et al.  The new S language: a programming environment for data analysis and graphics , 1988 .

[10]  Sigbert Klinke,et al.  Projection pursuit regression and neural networks , 1998 .

[11]  Richard D. De Veaux,et al.  Multicollinearity: A tale of two nonparametric regressions , 1994 .

[12]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[13]  Daryl Pregibon,et al.  Tree-based models , 1992 .

[14]  D. Titterington,et al.  Discussion: Linear Smoothers and Additive Models , 1989 .

[15]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[16]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[17]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[18]  David A. Belsley Demeaning Conditioning Diagnostics through Centering , 1984 .

[19]  Salvatore Ingrassia,et al.  Geometrical Aspects of Discrimination by Multilayer Perceptrons , 1999 .

[20]  Salvatore Ingrassia,et al.  Neural Network Modeling for Small Datasets , 2005, Technometrics.

[21]  Lyle H. Ungar,et al.  A comparison of two nonparametric estimation schemes: MARS and neural networks , 1993 .

[22]  R. Fildes Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1993 .

[23]  D. J. Donnell,et al.  Analysis of Additive Dependencies and Concurvities Using Smallest Additive Principal Components , 1994 .

[24]  Alston S. Householder,et al.  The Theory of Matrices in Numerical Analysis , 1964 .

[25]  David A. Belsley,et al.  Conditioning Diagnostics: Collinearity and Weak Data in Regression , 1991 .

[26]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[27]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[28]  R. Tibshirani,et al.  Linear Smoothers and Additive Models , 1989 .

[29]  David A. Belsley,et al.  Regression Analysis and its Application: A Data-Oriented Approach.@@@Applied Linear Regression.@@@Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1981 .

[30]  J. Friedman Multivariate adaptive regression splines , 1990 .

[31]  Jerome H Friedman,et al.  Classification and Multiple Regression through Projection Pursuit , 1985 .