Efforts to adjust for confounding by neighborhood using complex survey data

In social epidemiology, one often considers neighborhood or contextual effects on health outcomes, in addition to effects of individual exposures. This paper is concerned with the estimation of an individual exposure effect in the presence of confounding by neighborhood effects, motivated by an analysis of National Health Interview Survey (NHIS) data. In the analysis, we operationalize neighborhood as the secondary sampling unit of the survey, which consists of small groups of neighboring census blocks. Thus the neighborhoods are sampled with unequal probabilities, as are individuals within neighborhoods. We develop and compare several approaches for the analysis of the effect of dichotomized individual-level education on the receipt of adequate mammography screening. In the analysis, neighborhood effects are likely to confound the individual effects, due to such factors as differential availability of health services and differential neighborhood culture. The approaches can be grouped into three broad classes: ordinary logistic regression for survey data, with either no effect or a fixed effect for each cluster; conditional logistic regression extended for survey data; and generalized linear mixed model (GLMM) regression for survey data. Standard use of GLMMs with small clusters fails to adjust for confounding by cluster (e.g. neighborhood); this motivated us to develop an adaptation. We use theory, simulation, and analyses of the NHIS data to compare and contrast all of these methods. One conclusion is that all of the methods perform poorly when the sampling bias is strong; more research and new methods are clearly needed.

[1]  S. Rabe-Hesketh,et al.  Multilevel modelling of complex survey data , 2006 .

[2]  Monica Pratesi,et al.  Weighted estimation in multilevel ordinal and binary models in the presence of informative sampling designs , 2004 .

[3]  Massey Jt,et al.  Design and estimation for the National Health Interview Survey 1985-94. , 1989 .

[4]  H. Goldstein,et al.  Weighting for unequal selection probabilities in multilevel models , 1998 .

[5]  J. Neyman,et al.  Consistent Estimates Based on Partially Consistent Observations , 1948 .

[6]  E. Korn,et al.  Analysis of Health Surveys: Korn/Analysis , 1999 .

[7]  Geert Verbeke,et al.  Conditional Linear Mixed Models , 2001 .

[8]  C. Rampichini Measurement error in multilevel models with sample cluster means , 2009 .

[9]  Chris J. Skinner,et al.  Analysis of complex surveys , 1991 .

[10]  S. Vansteelandt,et al.  Conditional Generalized Estimating Equations for the Analysis of Clustered and Longitudinal Data , 2008, Biometrics.

[11]  Charles E. McCulloch,et al.  Separating between‐ and within‐cluster covariate effects by using conditional and partitioning methods , 2006 .

[12]  Tapabrata Maiti,et al.  Practical Methods for Design and Analysis of Complex Surveys (2nd ed.) , 2006 .

[13]  Sander Greenland,et al.  Invited commentary: variable selection versus shrinkage in the control of multiple confounders. , 2007, American journal of epidemiology.

[14]  N. T. Longford,et al.  MODEL‐BASED VARIANCE ESTIMATION IN SURVEYS WITH STRATIFIED CLUSTERED DESIGN , 1996 .

[15]  J. Kalbfleisch,et al.  Between- and within-cluster covariate effects in the analysis of clustered data. , 1998, Biometrics.

[16]  Jack R. Anderson Design and estimation for the National Health Interview Survey, 1995-2004. , 2000, Vital and health statistics. Series 2, Data evaluation and methods research.

[17]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[18]  Michael K Parides,et al.  Separation of individual‐level and cluster‐level covariate effects in regression analysis of correlated data , 2003, Statistics in medicine.

[19]  T R Ten Have,et al.  An Empirical Comparison of Several Clustered Data Approaches Under Confounding Due to Cluster Effects in the Analysis of Complications of Coronary Angioplasty , 1999, Biometrics.

[20]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[21]  Sander Greenland,et al.  A review of multilevel theory for ecologic analyses , 2002, Statistics in medicine.

[22]  A. Agresti Categorical data analysis , 1993 .

[23]  J. D. Kalbfleisch,et al.  Conditions for consistent estimation in mixed-effects models for binary matched-pairs data† , 1994 .