Although “choose all that apply” questions are common in modern surveys, methods for analyzing associations among responses to such questions have only recently been developed. These methods are generally valid only for simple random sampling, but these types of questions often appear in surveys conducted under more complex sampling plans. The purpose of this article is to provide statistical analysis methods that can be applied to “choose all that apply” questions in complex survey sampling situations. Loglinear models are developed to incorporate the multiple responses inherent in these types of questions. Statistics to compare models and to measure association are proposed and their asymptotic distributions are derived. Monte Carlo simulations show that tests based on adjusted Pearson statistics generally hold their correct size when comparing models. These simulations also show that confidence intervals for odds ratios estimated from loglinear models have good coverage properties, while being shorter than those constructed using empirical estimates. Furthermore, the methods are shown to be applicable to more general problems of modeling associations between elements of two or more binary vectors. The proposed analysis methods are applied to data from the National Health and Nutrition Examination Survey. The Canadian Journal of Statistics © 2009 Statistical Society of Canada
Quoique les questions du type « Selectionner une ou plusieurs reponses » sont courantes dans les enquetes modernes, les methodes pour analyser les associations entre les reponses a de telles questions viennent seulement d'etre developpees. Ces methodes sont habituellement valides uni-quement pour des echantillons aleatoires simples, mais ce genre de questions apparaissent souvent dans les enquetes conduites sous des plans de sondage beaucoup plus complexes. Le but de cet article est de donner des methodes d'analyse statistique pouvant etre appliquees aux questions de type « Selectionner une ou plusieurs reponses » dans des enquetes utilisant des plans de sondage complexes. Des modeles loglineaires sont developpes permettant d'incorporer les reponses multiples inherentes a ce type de questions. Des statistiques permettant de comparer les modeles et de mesu-rer l'association sont proposees et leurs distributions asymptotiques sont obtenues. Des simulations de Monte-Carlo montrent que les tests bases sur les statistiques de Pearson ajustees maintiennent generalement leur niveau lorsqu'ils sont utilises pour comparer des modeles. Ces etudes montrent egalement que les niveaux des intervalles de confiance pour les rapports de cotes estimes a par-tir des modeles loglineaires ont de bonnes proprietes de couverture tout en etant plus courts que ceux utilisant les estimations empiriques. De plus, il est montre que ces methodes peuvent aussi etres utilisees dans un contexte plus general de modelisation de l'association entre les elements de deux ou plusieurs vecteurs binaires. Les methodes d'analyse proposees sont appliquees a des donnees provenant de l'etude americaine « National Health and Nutrition Examination Survey » (NHANES). La revue canadienne de statistique © 2009 Societe statistique du Canada
[1]
Thomas M. Loughin,et al.
Testing for Association in Contingency Tables with Multiple Column Responses
,
1998
.
[2]
A. Agresti.
Categorical data analysis
,
1993
.
[3]
A. Scott,et al.
On Chi-Squared Tests for Multiway Contingency Tables with Cell Proportions Estimated from Survey Data
,
1984
.
[4]
Dan Nettleton,et al.
Multiple Marginal Independence Testing for Pick Any/C Variables
,
2000
.
[5]
Christopher R Bilder,et al.
Testing for Marginal Independence between Two Categorical Variables with Multiple Responses
,
2004,
Biometrics.
[6]
Computer Selection of Size-Biased Samples
,
1989
.
[7]
Modeling Association Between Two or More Categorical Variables that Allow for Multiple Category Choices
,
2007
.
[8]
A Agresti,et al.
Modeling a Categorical Variable Allowing Arbitrarily Many Category Choices
,
1999,
Biometrics.
[9]
A. Singh,et al.
Tests of Independence on Two-Way Tables under Cluster Sampling: An Evaluation
,
1996
.
[10]
V. Preedy,et al.
National Health and Nutrition Examination Survey
,
2010
.
[11]
J. N. K. Rao,et al.
Analysis of Categorical Response Data from Complex Surveys: An Appraisal and Update
,
2003
.
[12]
S. Gange.
Generating Multivariate Categorical Variates Using the Iterative Proportional Fitting Algorithm
,
1995
.
[13]
T M Loughin,et al.
On the first-order Rao-Scott correction of the Umesh-Loughin-Scherer statistic.
,
2001,
Biometrics.
[14]
D. R. Thomas,et al.
A Simple Test of Association for Contingency Tables with Multiple Column Responses
,
2000,
Biometrics.
[15]
U. Umesh.
Predicting nominal variable relationships with multiple response
,
1995
.
[16]
Log-linear models for correlated marginal totals of a contingency table
,
1985
.
[17]
D. Roland Thomas,et al.
Testing for Association Using Multiple Response Survey Data: Approximate Procedures Based on the Rao-Scott Approach
,
2004
.
[18]
Permutation Analysis of Data with Multiple Binary Category Choices
,
2003,
Psychological reports.
[19]
Alan Agresti,et al.
Strategies for Modeling a Categorical Variable Allowing Multiple Category Choices
,
2001
.
[20]
A. Scott,et al.
The Analysis of Categorical Data from Complex Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in Two-Way Tables
,
1981
.
[21]
G. Roberts,et al.
Logistic regression analysis of sample survey data
,
1987
.
[22]
Robert Chambers,et al.
Analysis of survey data
,
2003
.