Generalized Estimating Equations

Correlated datasets develop when multiple observations are collected from a sampling unit (e.g., repeated measures of a bank over time, or hormone levels in a breast cancer patient over time), or from clustered data where observations are grouped based on a shared characteristic (e.g., observations on different banks grouped by zip code, or on cancer patients from a specific clinic). The generalized linear model framework for independent data is extended to model correlated data via the introduction of second-order variance components directly into the independent data model's estimating equation. This generalization of the estimating equation from the independence model is thus referred to as a Generalized Estimating Equation (GEE). This article discusses the foundation of GEEs as well as how user-specified correlation structures are accommodated in the model-building process. This article also discusses the relationship and similarity to the underlying generalized linear model framework and we point out alternative approaches to GEEs for modeling correlated data such as fixed-effects models and random-effects models. Keywords: working correlation matrix; sandwich estimate of variance; generalized linear models; subject-specific models; population-averaged models