Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis by DANIELS, M. J. and HOGAN, J. W

In most studies, the intended (full) data, i.e., the data that the study investigators wish to collect, are inevitably incompletely observed. In modern studies, the full data are typically high dimensional, usually comprising many baseline and timevarying variables. Scientific interest, however, often focuses on some low-dimensional parameter of the distribution of the full data. Specification of realistic parametric models for the mechanism generating high-dimensional data is most often very challenging, if not impossible. Nonparametric and semiparametric models, i.e., models in which the data-generating process is characterized by parameters ranging over a large, non-Euclidean, space and, possibly, also a few meaningful real-valued parameters, meet the analytic challenge posed by these high-dimensional data because they do not make assumptions about the components of the full data distribution that are of little scientific interest. Analytic strategies based on semiparametric models avoid the possibility of incorrect inferences due to misspecification of models for the secondary parts of the full data law. Drawing from the modern theory of semiparametric efficient inference developed since the 1980s, Robins and Rotnitzky (1992) derived a general estimating equations methodology in coarsened, i.e., incompletely observed, data models under nonor semiparametric models for arbitrary full data configurations. This methodology, based on the geometry of scores and influence functions, applies when the