Impact of small group size on neighbourhood influences in multilevel models

Background Given the growing availability of multilevel data from national surveys, researchers interested in contextual effects may find themselves with a small number of individuals per group. Although there is a growing body of literature on sample size in multilevel modelling, few have explored the impact of group sizes of less than five. Methods In a simulated analysis of real data, the impact of a group size of less than five was examined on both a continuous and dichotomous outcome in a simple two-level multilevel model. Models with group sizes one to five were compared with models with complete data. Four different linear and logistic models were examined: empty models; models with a group-level covariate; models with an individual-level covariate and models with an aggregated group-level covariate. The study evaluated further whether the impact of small group size differed depending on the total number of groups. Results When the number of groups was large (N=459), neither fixed nor random components were affected by small group size, even when 90% of tracts had only one individual per tract and even when an aggregated group-level covariate was examined. As the number of groups decreased, the SE estimates of both fixed and random effects were inflated. Furthermore, group-level variance estimates were more affected than were fixed components. Conclusions Datasets in which there is a small to moderate number of groups, with the majority of very small group size (n<5), size may fail to find or even consider a group-level effect when one may exist and also may be underpowered to detect fixed effects.

[1]  J. Sundquist,et al.  Cardiovascular risk factors and the neighbourhood environment: a multilevel analysis. , 1999, International journal of epidemiology.

[2]  Joanna E. Holsten,et al.  Obesity and the community food environment: a systematic review , 2008, Public Health Nutrition.

[3]  J. Hox,et al.  Sample sizes for multilevel modeling , 2002 .

[4]  Roel Bosker,et al.  Multilevel analysis : an introduction to basic and advanced multilevel modeling , 1999 .

[5]  P. Clarke,et al.  When can group level clustering be ignored? Multilevel models versus single-level models with sparse data , 2008, Journal of Epidemiology & Community Health.

[6]  G. A. Marcoulides,et al.  Multilevel Analysis Techniques and Applications , 2002 .

[7]  Roel Bosker,et al.  Standard Errors and Sample Sizes for Two-Level Research , 1993 .

[8]  R. Moineddin,et al.  A simulation study of sample size for multilevel logistic regression models , 2007, BMC medical research methodology.

[9]  Ronald H. Heck,et al.  An Introduction to Multilevel Modeling Techniques , 1999 .

[10]  K. Flegal,et al.  Prevalence and trends in obesity among US adults, 1999-2008. , 2010, JAMA.

[11]  Jan de Leeuw,et al.  Introducing Multilevel Modeling , 1998 .

[12]  Cora J. M. Maas,et al.  Robustness issues in multilevel regression analysis , 2004 .

[13]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[14]  Timothy J. Robinson,et al.  Multilevel Analysis: Techniques and Applications , 2002 .

[15]  K. Flegal,et al.  Prevalence and trends in obesity among US adults, 1999-2000. , 2002, JAMA.

[16]  S. Raudenbush,et al.  Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. , 2001, Psychological methods.

[17]  Michael Livingston,et al.  Multilevel Analysis for Applied Research—It's Just Regression! , 2009 .

[18]  J. Sundquist,et al.  Neighborhood environment and self-reported health status: a multilevel analysis. , 1999, American journal of public health.

[19]  Basile Chaix,et al.  A brief conceptual tutorial of multilevel analysis in social epidemiology: using measures of clustering in multilevel logistic regression to investigate contextual phenomena , 2006, Journal of Epidemiology and Community Health.