The determinants of cumulative endogeneity bias in multivariate analysis

The BLU properties of OLS estimators under known assumptions have encouraged the widespread use of OLS multivariate regression analysis in many empirical studies that are based upon a conceptual model of a single explanatory equation. However, such a model may well be an imperfect empirical approximation to the valid underlying conceptual model, that may well contain several important additional inter-relationships between the relevant variables. In this paper, we examine the conditions under which we can predict the direction of the resultant endogeneity bias that will prevail in the OLS asymptotic parameter estimates for any given endogenous or predetermined variable, and the extent to which we can rely upon simple heuristics in this process. We also identify the underlying structural parameters to which the magnitude of the endogeneity bias is sensitive. The importance of such sensitivity analysis has been underlined by an increasing awareness of the inability of standard diagnostic tests to shed light upon the extent of the endogeneity bias, rather than upon merely its existence. The paper examines the implications of the analysis for statistical inferences about the true value of the regression coefficients and the validity of associated t-statistics.