A Regression Paradox for Linear Models: Sufficient Conditions and Relation to Simpson’s Paradox

An analysis of customer survey data using direct and reverse linear regression leads to inconsistent conclusions with respect to the effect of a group variable. This counterintuitive phenomenon, called the “regression paradox,” causes seemingly contradictory group effects when the predictor and regressand are interchanged. Using analytical developments as well as geometric arguments, we describe sufficient conditions under which the regression paradox will appear in linear Gaussian models. The results show that the phenomenon depends on a distribution shift between the groups relative to the predictability of the model. As a consequence, the paradox can appear naturally in certain distributions, and may not be caused by sampling error or incorrectly specified models. Simulations verify that the paradox may appear in more general, non-Gaussian settings. An interesting, geometric connection to Simpson’s paradox is provided.

[1]  Ronald L. Iman,et al.  Rejoinder to comments , 1980 .

[2]  H. Wainer,et al.  Two Statistical Paradoxes in the Interpretation of Group Differences , 2004 .

[3]  Jeffrey S. Racine,et al.  The Reverse Regression Problem: Statistical Paradox or Artefact of Misspecification? , 1995 .

[4]  F. Lord A paradox in the interpretation of group comparisons. , 1967, Psychological bulletin.

[5]  H. Wainer,et al.  Two statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data , 2004 .

[6]  M. Ferber,et al.  Employment Discrimination: An Empirical Test of Forward Versus Reverse Regression , 1984 .

[7]  Taylor Francis Online,et al.  The American statistician , 1947 .

[8]  P. Holland Statistics and Causal Inference , 1985 .

[9]  P. Bickel,et al.  Sex Bias in Graduate Admissions: Data from Berkeley , 1975, Science.

[10]  Harry V. Roberts,et al.  Rejoinder to Comments on "Reverse Regression, Fairness, and Employment , 1984 .

[11]  Geng Zh Several untestable assumptions in epidemiological observational studies: control models and mechanism of missing data , 2006 .

[12]  A. Goldberger Reverse Regression and Salary Discrimination , 1984 .

[13]  G. Samsa Resolution of a Regression Paradox in Pretest-Posttest Designs , 1992 .

[14]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[15]  Onyebuchi A Arah,et al.  The role of causal reasoning in understanding Simpson's paradox, Lord's paradox, and the suppression effect: covariate selection in the analysis of observational studies , 2008, Emerging themes in epidemiology.

[16]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[17]  Harry V. Roberts,et al.  Reverse Regression, Fairness, and Employment Discrimination , 1983 .

[18]  Mary M. Whiteside,et al.  Reverse Regression, Collinearity, and Employment Discrimination , 1989 .