Addressing Data Sparseness in Contextual Population Research

The use of multilevel modeling with data from population-based surveys is often limited by the small number of cases per Level 2 unit, prompting a recent trend in the neighborhood literature to apply cluster techniques to address the problem of data sparseness. In this study, the authors use Monte Carlo simulations to investigate the effects of marginal group sizes on multilevel model performance, bias, and efficiency. They then employ cluster analysis techniques to minimize data sparseness and examine the consequences in the simulations. They find that estimates of the fixed effects are robust at the extremes of data sparseness, while cluster analysis is an effective strategy to increase group size and prevent the overestimation of variance components. However, researchers should be cautious about the degree to which they use such clustering techniques due to the introduction of artificial within-group heterogeneity.

[1]  Jacob Cohen Statistical Power Analysis , 1992 .

[2]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[3]  Juan Merlo,et al.  Effect of neighborhood social participation on individual use of hormone replacement therapy and antihypertensive medication: a multilevel analysis. , 2003, American journal of epidemiology.

[4]  Ilya Novikov A remark on efficient simulations in SAS , 2003 .

[5]  S. Raudenbush,et al.  Neighborhood support and the birth weight of urban infants. , 2003, American journal of epidemiology.

[6]  S. Olejnik Variance Heterogeneity: An Outcome to Explain or a Nuisance Factor to Control , 1988 .

[7]  J Merlo,et al.  Hazardous effects of tobacco industry funding , 2003, Journal of epidemiology and community health.

[8]  L. Råstam,et al.  Diastolic blood pressure and area of residence: multilevel versus ecological analysis of social inequity , 2001, Journal of epidemiology and community health.

[9]  C. Ross,et al.  Neighborhood disadvantage, disorder, and health. , 2001, Journal of health and social behavior.

[10]  Patricia O'Campo,et al.  Invited commentary: Advancing theory and methods for multilevel models of residential neighborhoods and health. , 2003, American journal of epidemiology.

[11]  Philippa Clarke,et al.  Space Meets Time: Integrating Temporal and Contextual Influences on Mental Health in Early Adulthood , 2003, American Sociological Review.

[12]  Timothy J. Robinson,et al.  Multilevel Analysis: Techniques and Applications , 2002 .

[13]  A. Diez-Roux Multilevel analysis in public health research. , 2000, Annual review of public health.

[14]  K. Pickett,et al.  Multilevel analyses of neighbourhood socioeconomic context and health outcomes: a critical review , 2001, Journal of epidemiology and community health.

[15]  P. Veugelers,et al.  Proximate and contextual socioeconomic determinants of mortality: multilevel approaches in a setting with universal health care coverage. , 2001, American journal of epidemiology.

[16]  Cora J. M. Maas,et al.  The influence of violations of assumptions on multilevel parameter estimates and their standard errors , 2004, Comput. Stat. Data Anal..

[17]  J. Hox,et al.  Sufficient Sample Sizes for Multilevel Modeling , 2005 .

[18]  S. Robert Community-level socioeconomic status effects on adult health. , 1998, Journal of health and social behavior.

[19]  H. Goldstein Multilevel Statistical Models , 2006 .

[20]  R. Sampson,et al.  ASSESSING "NEIGHBORHOOD EFFECTS": Social Processes and New Directions in Research , 2002 .

[21]  K Y Liang,et al.  An overview of methods for the analysis of longitudinal data. , 1992, Statistics in medicine.

[22]  Eric P. Baumer,et al.  Deciphering Community and Race Effects on Adolescent Premarital Childbearing , 2000 .

[23]  F. LeClere,et al.  Aggregation and the measurement of income inequality: effects on morbidity. , 1999, Social science & medicine.

[24]  Roel Bosker,et al.  Standard Errors and Sample Sizes for Two-Level Research , 1993 .

[25]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[26]  S. Raudenbush,et al.  Neighborhoods and violent crime: a multilevel study of collective efficacy. , 1997, Science.

[27]  W. S. Robinson,et al.  Ecological correlations and the behavior of individuals. , 1950, International journal of epidemiology.

[28]  Stephen W. Raudenbush,et al.  1. Ecometrics: Toward a Science of Assessing Ecological Settings, with Application to the Systematic Social Observation of Neighborhoods , 1999 .

[29]  S. Chinn,et al.  Components of variance and intraclass correlations for the design of community-based surveys and intervention studies: data from the Health Survey for England 1994. , 1999, American journal of epidemiology.

[30]  T. Achenbach Integrative Guide for the 1991 CBCL/4-18, Ysr, and Trf Profiles , 1991 .

[31]  G. Stoddart,et al.  Unemployment and health: contextual-level influences on the production of health in populations. , 2001, Social science & medicine.

[32]  W. Axinn,et al.  Social Change, the Social Organization of Families, and Fertility Limitation1 , 2001, American Journal of Sociology.

[33]  F. Hou,et al.  Neighbourhood low income, income inequality and health in Toronto. , 2003, Health reports.

[34]  A. Cowles,et al.  A Statistical Study of Climate in Relation to Pulmonary Tuberculosis , 1935 .

[35]  S. Selvin,et al.  Preterm birth among African American and white women: a multilevel analysis of socioeconomic characteristics and cigarette smoking , 2003, Journal of epidemiology and community health.