Comparison of design-based and model-based methods to estimate the variance using National Population Health Survey data

The aim of this paper is to compare design-based with model-based methods for analyzing complex survey data. The analysis of survey data collected using a multi-stage sampling design should account for stratification, clustering and unequal inclusion probabilities. We compared the Rao-Wu bootstrap and Taylor linearization (design-based approaches) with logistic regression analysis based on generalized Estimating Equations (GEE) approach (a model-based method). The design and model based approaches were applied and compared using Wave 5 (2002–03) of the National Population Health Survey (NPHS) dataset. NPHS based on an initial stratified multi-stage design is a continuing longitudinal study under which data is collected on general health information of the Canadian population. Logistic regression was used, as the variable of interest for this study was binary, namely self-reported physician diagnosed asthma. When the three features of the complex survey design were not overlooked standard errors obtained were underestimated. However, accounting for all three features of survey design, the design-based and model-based methods produced similar parameter estimates, while larger standard errors were obtained for design-based methods than for their model-based counterpart.

[1]  D. Binder On the variances of asymptotically normal estimators from complex surveys , 1983 .

[2]  W. DuMouchel,et al.  Using Sample Survey Weights in Multiple Regression Analyses of Stratified Samples , 1983 .

[3]  L. Chambless,et al.  Maximum likelihood methods for complex sample data: logistic regression and discrete proportional hazards models , 1985 .

[4]  Geert Molenberghs,et al.  Multilevel modeling of complex survey data , 2002 .

[5]  P. McCullagh Quasi-Likelihood Functions , 1983 .

[6]  David A. Binder,et al.  Design-Based and Model-Based Methods for Estimating Model Parameters , 2003 .

[7]  David Binder ISSUES RELATING TO METHODS FOR ANALYSIS OF SURVEY DATA , 2006 .

[8]  Risto Lehtonen,et al.  Practical Methods for Design and Analysis of Complex Surveys , 1995 .

[9]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[10]  Sharon L. Lohr,et al.  A comparison of weighted and unweighted analyses in the national Crime Victimization Survey , 1994 .

[11]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[12]  L. Kish,et al.  SAMPLING ORGANIZATIONS AND GROUPS OF UNEQUAL SIZES. , 1965, American sociological review.

[13]  Chris J. Skinner,et al.  Analysis of complex surveys , 1991 .

[14]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[15]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[16]  Christopher Winship,et al.  Sampling Weights and Regression Analysis , 1994 .

[17]  Jerome P. Reiter,et al.  Analytical Modeling in Complex Surveys of Work Practices , 2005 .

[18]  B. Carlson Software for Statistical Analysis of Sample Survey Data , 1998 .

[19]  Carl-Erik Särndal,et al.  Model Assisted Survey Sampling , 1997 .

[20]  G G Koch,et al.  Applying sample survey methods to clinical trials data , 2001, Statistics in medicine.

[21]  H. Goldstein Multilevel Statistical Models , 2006 .

[22]  Eun Sul Lee,et al.  Analyzing Complex Survey Data , 1989 .

[23]  R. Little To Model or Not To Model? Competing Modes of Inference for Finite Population Sampling , 2004 .