Innovative strategies using SUDAAN for analysis of health surveys with complex samples

Large-scale health surveys provide a wealth of information for addressing problems in health sciences research. Designed for multiple purposes, these surveys frequently have large sample sizes and extensive measurements of demographic and socioeconomic characteristics, risk factors, disease outcomes and health care service use and costs. Complex features of the sampling design typically employed to select the survey sample, coupled with the vast amount of information available from the survey database, underlie issues that must be addressed during data processing and analysis. Numerous articles in the literature have focused on the debate of whether or not, and how, to control for features of the sample design during data analysis. Traditional statistical methods for simple random samples and the software that accompanies them have historically not had the capacity to account for the survey design. Recent advancements in statistical methodology for survey data analysis have greatly expanded the analytical tools available to the survey analyst. Commercial software packages that incorporate these methods offer the analyst convenient ways for applying such tools to large survey databases in an easy and efficient manner. We present an overview of analysis strategies for survey data and illustrate their application via the SUDAAN software system. Examples for analyses are provided through data from two large US health surveys, the National Health Interview Survey and the Longitudinal Study of Aging. Questions of both a cross-sectional and longitudinal nature are addressed. The examples involve logistic regression, time-to- event analysis, and repeated measures analysis.

[1]  L. Kish,et al.  Inference from Complex Samples , 1974 .

[2]  G. Koch,et al.  Application of sample survey methods for modelling ratios to incidence densities. , 1994, Statistics in medicine.

[3]  Gary G. Koch,et al.  Strategies in the Multivariate Analysis of Data from Complex Surveys , 1975 .

[4]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[5]  K Y Liang,et al.  Longitudinal data analysis for discrete and continuous outcomes. , 1986, Biometrics.

[6]  Leslie Kish,et al.  Balanced Repeated Replications for Standard Errors , 1970 .

[7]  M. Chyba,et al.  The Longitudinal Study of Aging: 1984-90. , 1992, Vital and health statistics. Ser. 1, Programs and collection procedures.

[8]  P. Albert,et al.  Models for longitudinal data: a generalized estimating equation approach. , 1988, Biometrics.

[9]  G. Koch,et al.  Estimating activity limitation in the noninstitutionalized population: a method for small areas. , 1994, American journal of public health.

[10]  G. Koch,et al.  Design principles and statistical considerations in periodontal clinical trials. , 1997, Annals of periodontology.

[11]  G G Koch,et al.  Biostatistical implications of design, sampling, and measurement to health science data analysis. , 1980, Annual review of public health.

[12]  R. Woodruff A Simple Method for Approximating the Variance of a Complicated Estimate , 1971 .

[13]  William G. Cochran,et al.  Sampling Techniques, 3rd Edition , 1963 .

[14]  G G Koch,et al.  Statistical methodologies useful for the analysis of data from risk-assessment studies. , 1992, Journal of public health dentistry.

[15]  E L Korn,et al.  Survey inference for subpopulations. , 1996, American journal of epidemiology.

[16]  B Barnwell,et al.  SUDAAN User's Manual, Release 7.5, , 1997 .

[17]  Robert E. Fay,et al.  A Jackknifed Chi-Squared Test for Complex Samples , 1985 .

[18]  P. Taylor,et al.  Body iron stores and the risk of cancer. , 1988, The New England journal of medicine.

[19]  D. Pfeffermann,et al.  Regression models for grouped populations in cross-section surveys , 1985 .

[20]  Edward L. Korn,et al.  Simultaneous Testing of Regression Coefficients with Complex Survey Data: Use of Bonferroni t Statistics , 1990 .

[21]  E. Korn,et al.  Regression analysis with clustered data. , 1994, Statistics in medicine.

[22]  J T Massey,et al.  Plan and operation of the Second National Health and Nutrition Examination Survey, 1976-1980. , 1981, Vital and health statistics. Ser. 1, Programs and collection procedures.

[23]  R. Folsom,et al.  Inference about regression models from sample survey data , 1976 .

[24]  D. Brock,et al.  Strategies in the Multivariate Analysis of Data from Complex Surveys II: An Application to the United States National Health Interview Survey , 1976 .

[25]  E L Korn,et al.  Epidemiologic studies utilizing surveys: accounting for the sampling design. , 1991, American journal of public health.

[26]  T. Gregoire Design-based and model-based inference in survey sampling: appreciating the difference , 1998 .

[27]  L. Chambless,et al.  Maximum likelihood methods for complex sample data: logistic regression and discrete proportional hazards models , 1985 .

[28]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[29]  Danny Pfeffermann,et al.  Robustness considerations in the choice of a method of inference for regression analysis of survey data , 1985 .

[30]  K. Wolter Introduction to Variance Estimation , 1985 .

[31]  Michael Witt,et al.  SUDAAN User's Manual, Release 9.0 , 2002 .

[32]  David A. Binder,et al.  Fitting Cox's proportional hazards models from survey data , 1992 .

[33]  D. Binder On the variances of asymptotically normal estimators from complex surveys , 1983 .

[34]  Edward L. Korn,et al.  Analysis of Large Health Surveys: Accounting for the Sampling Design , 1995 .

[35]  J R Landis,et al.  Blood lead and blood pressure. Relationship in the adolescent and adult US population. , 1985, JAMA.

[36]  E L Korn,et al.  Modelling the sampling design in the analysis of health surveys , 1996, Statistical methods in medical research.

[37]  D. Pfeffermann The Role of Sampling Weights when Modeling Survey Data , 1993 .

[38]  H. S. Konijn Regression Analysis in Sample Surveys , 1962 .

[39]  Massey Jt,et al.  Design and estimation for the National Health Interview Survey 1985-94. , 1989 .