A Study of Effects of MultiCollinearity in the Multivariable Analysis.

A multivariable analysis is the most popular approach when investigating associations between risk factors and disease. However, efficiency of multivariable analysis highly depends on correlation structure among predictive variables. When the covariates in the model are not independent one another, collinearity/multicollinearity problems arise in the analysis, which leads to biased estimation. This work aims to perform a simulation study with various scenarios of different collinearity structures to investigate the effects of collinearity under various correlation structures amongst predictive and explanatory variables and to compare these results with existing guidelines to decide harmful collinearity. Three correlation scenarios among predictor variables are considered: (1) bivariate collinear structure as the most simple collinearity case, (2) multivariate collinear structure where an explanatory variable is correlated with two other covariates, (3) a more realistic scenario when an independent variable can be expressed by various functions including the other variables.

[1]  G. Monette,et al.  Generalized Collinearity Diagnostics , 1992 .

[2]  C. Dean,et al.  Collinearity between a 30-centimorgan segment of Arabidopsis thaliana chromosome 4 and duplicated regions within the Brassica napus genome. , 1998, Genome.

[3]  Jacob Cohen,et al.  Applied multiple regression/correlation analysis for the behavioral sciences , 1979 .

[4]  Praveen K. Kopalle,et al.  The impact of collinearity on regression analysis: the asymmetric effect of negative and positive correlations , 2002 .

[5]  C N Haas On modeling correlated random variables in risk assessment. , 1999, Risk analysis : an official publication of the Society for Risk Analysis.

[6]  B. Carnes,et al.  The Use of Regression for Detecting Competition with Multicollinear Data , 1988 .

[7]  D. Kleinbaum,et al.  Applied Regression Analysis and Other Multivariate Methods , 1978 .

[8]  A. D. Lunn,et al.  A note on generating correlated binary variables , 1998 .

[9]  Patricia B. Elmore,et al.  The Effect of Multicollinearity and the Violation of the Assumption of Normality on the Testing of Hypotheses in Regression Analysis. , 1975 .

[10]  Charlotte H. Mason,et al.  Collinearity, power, and interpretation of multiple regression analysis. , 1991 .

[11]  B Gullberg,et al.  Bias in diet assessment methods--consequences of collinearity and measurement errors on power and observed relative risks. , 1997, International journal of epidemiology.

[12]  K. George,et al.  Nevill's explanation of Kleiber's 0.75 mass exponent: an artifact of collinearity problems in least squares models? , 1997, Journal of applied physiology.

[13]  Athanasios Vasilopoulos Generating correlated random variables for quality control applications , 1983 .

[14]  Neima Brauner,et al.  Minimizing the effects of collinearity in polynomial regression , 1997 .

[15]  D. Harris,et al.  RISK FACTORS FOR PERINATAL TRANSMISSION OF HUMAN IMMUNODEFICIENCY VIRUS TYPE 1 IN WOMEN TREATED WITH ZIDOVUDINE , 1999 .

[16]  Y. Wax,et al.  Collinearity diagnosis for a relative risk regression analysis: an application to assessment of diet-cancer relationship in epidemiological studies. , 1992, Statistics in medicine.

[17]  Mitchell H Katz,et al.  Multivariable Analysis: A Primer for Readers of Medical Research , 2003, Annals of Internal Medicine.

[18]  David A. Belsley Multicollinearity: Diagnosing its Presence and Assessing the Potential Damage it Causes Least Squares Estimation , 1976 .

[19]  Y. Hérault,et al.  Serial deletions and duplications suggest a mechanism for the collinearity of Hoxd genes in limbs , 2002, Nature.

[20]  D Hurnik,et al.  An overview of techniques for dealing with large numbers of independent variables in epidemiologic studies. , 1997, Preventive veterinary medicine.

[21]  Y-K Tu,et al.  Problems of correlations between explanatory variables in multiple regression analyses in the dental literature , 2005, British Dental Journal.

[22]  C. Park,et al.  A Simple Method for Generating Correlated Binary Variates , 1996 .

[23]  S A Glantz,et al.  Multiple regression for physiological data analysis: the problem of multicollinearity. , 1985, The American journal of physiology.

[24]  G. Stewart Collinearity and Least Squares Regression , 1987 .

[25]  D. Kleinbaum,et al.  Applied regression analysis and other multivariable methods, 3rd ed. , 1998 .

[26]  M. Feldstein Multicollinearity and the Mean Square Error of Alternative Estimators , 1973 .

[27]  J. Hair Multivariate data analysis , 1972 .

[28]  R. Stine Graphical Interpretation of Variance Inflation Factors , 1995 .

[29]  M. Trick,et al.  Assessing the level of collinearity between Arabidopsis thaliana and Brassica napus for A. thaliana chromosome 5. , 2002, Genome.

[30]  Mik Wisniewski,et al.  Applied Regression Analysis: A Research Tool , 1990 .

[31]  Roy E. Welsch,et al.  Efficient Computing of Regression Diagnostics , 1981 .