Modeling Certainty with Clustered Data: A Comparison of Methods

Political scientists often analyze data in which the observational units are clustered into politically or socially meaningful groups with an interest in estimating the effects that group-level factors have on individual-level behavior. Even in the presence of low levels of intracluster correlation, it is well known among statisticians that ignoring the clustered nature of such data overstates the precision estimates for group-level effects. Although a number of methods that account for clustering are available, their precision estimates are poorly understood, making it difficult for researchers to choose among approaches. In this paper, we explicate and compare commonly used methods (clustered robust standard errors (SEs), random effects, hierarchical linear model, and aggregated ordinary least squares) of estimating the SEs for group-level effects. We demonstrate analytically and with the help of empirical examples that under ideal conditions there is no meaningful difference in the SEs generated by these methods. We conclude with advice on the ways in which analysts can increase the efficiency of clustered designs.

[1]  Kosuke Imai,et al.  Survey Sampling , 1998, Nov/Dec 2017.

[2]  Jan E. Leighley,et al.  Political Parties and Class Mobilization in Contemporary United States Elections , 1996 .

[3]  Donald P. Green,et al.  Analysis of Cluster-Randomized Experiments: A Comparison of Alternative Estimation Approaches , 2007, Political Analysis.

[4]  R. Rumberger Hierarchical linear models: Applications and data analysis methods: and. Newbury Park, CA: Sage, 1992. (ISBN 0-8039-4627-9), pp. xvi + 265. Price: U.S. $45.00 (cloth) , 1997 .

[5]  Allan Donner,et al.  Design and Analysis of Cluster Randomization Trials in Health Research , 2001 .

[6]  Jan Kmenta,et al.  Elements of Econometrics: Second Edition , 1997 .

[7]  Jan E. Leighley,et al.  Party Ideology, Organization, and Competitiveness as Mobilizing Forces in Gubernatorial Elections , 1993 .

[8]  Bradford S. Jones,et al.  Modeling Multilevel Data Structures , 2002 .

[9]  Sophia Rabe-Hesketh,et al.  Multilevel and Longitudinal Modeling Using Stata , 2005 .

[10]  Jake Bowers,et al.  Designing multi-level studies: sampling voters and electoral contexts , 2002 .

[11]  D. Green,et al.  Dirty Pool , 2001, International Organization.

[12]  Thomas Plümper,et al.  Efficient Estimation of Time-Invariant and Rarely Changing Variables in Finite Sample Panel Analyses with Unit Fixed Effects , 2007, Political Analysis.

[13]  J. Murie,et al.  Throwing out the baby with the bathwater , 1990 .

[14]  Richard J. Timpone Structure, Behavior, and Voter Turnout in the United States , 1998, American Political Science Review.

[15]  Jerome Cornfield,et al.  SYMPOSIUM ON CHD PREVENTION TRIALS: DESIGN ISSUES IN TESTING LIFE STYLE INTERVENTIONRANDOMIZATION BY GROUP: A FORMAL ANALYSIS , 1978 .

[16]  Nathaniel Beck,et al.  Throwing Out the Baby with the Bath Water: A Comment on Green, Kim, and Yoon , 2001, International Organization.

[17]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[18]  Gerald C. Wright,et al.  Registration, Turnout, and State Party Systems , 1999 .

[19]  J. Broz,et al.  Political System Transparency and Monetary Commitment Regimes , 2002, International Organization.

[20]  Christopher Zorn,et al.  Comparing GEE and Robust Standard Errors for Conditionally Dependent Data , 2006 .

[21]  Kevin Arceneaux,et al.  Using Cluster Randomized Field Experiments to Study Voting Behavior , 2005 .

[22]  David M. Murray,et al.  Design and Analysis of Group- Randomized Trials , 1998 .

[23]  Allan Donner,et al.  Some aspects of the design and analysis of cluster randomization trials , 2002 .

[24]  J Cornfield,et al.  Randomization by group: a formal analysis. , 1978, American journal of epidemiology.

[25]  M. Jennings,et al.  Political Similarity And Influence Between Husbands And Wives , 2001 .