Nonparametric Regression Methods for Longitudinal Data Analysis. Mixed-effects Modeling Approaches

Modern datasets have become increasingly complex, raising nontraditional modeling and inferential challenges. In particular, the size of longitudinal studies has tended to increase, in terms of both number of subjects and number of observations per subject. In this context, it has become increasingly necessary to adapt old tools and design new statistical methods to extract information without stringent parametric assumptions about the population or subject trajectories. Nonparametric methods for longitudinal studies are a powerful and fast developing set of tools that have become computationally feasible over the past decade. The authors provide an excellent review of various nonparametric methods, including local polynomial regression, regression splines, smoothing splines, and penalized spline regression. The link between these methods and mixed models is described as well as the implications of this link for model fitting and inference using standard mixed software. The authors use three datasets as showcases for the various nonparametric methodologies: (1) the progesterone data—25 daily observations for 22 conceptive and 69 nonconceptive menstrual cycles aligned at the day of ovulation; (2) the AIDS Clinical Trials Group 388 (ACTG 388) data—up to 17 observations over 120 weeks of CD4 counts for 166 HIV-1 infected subjects under highly active antiretroviral therapy treatment; (3) the Multi-Center AIDS Cohort Study data—up to 13 observations of CD4 percent depletion for 283 men after HIV infection. All these datasets have nonlinear, smoothly varying population or group means and show high between subject and low within subject variability. Thus, a nonparametric population (or group) mean with random subject intercept would fit all three datasets reasonably well. The authors provide a short, but fairly thorough, review of parametric mixed models and discuss how the associated inferential machinery extends naturally to nonparametric regression for longitudinal data. Although this equivalence has far reaching implications for model fitting, it does not lead immediately to asymptotic results in the nonparametric context where, unlike standard parametric mixed models, the outcome vector typically cannot be partitioned into independent subvectors. To their credit, the authors seem to recognize this issue and avoid the common pitfall of automatically extending the asymptotics of the parametric case without modification. About half of the book (Chaps. 3–7) is dedicated to describing nonparametric methods for fitting particular cases of the subject level nonparametric model yij = η(tij ) + vi(tij ) + ij , i = 1, . . . , n, j = 1, . . . , ni , (1) where yij is the outcome for subject i at design point tij ; { ij } is the error process, typically assumed to be independent and identically distributed; n is the number of subjects; and ni is the number of observations per subject. All methods use essentially two smoothing parameters: one for the population mean η(·) and one for all subject deviations from the population mean vi(·), i = 1, . . . , n. These parameters are different for each particular smoothing method: kernel bandwidth for local polynomials, number of knots for regression splines, or ratio of variances for smoothing and penalized splines. The general message of the book, which matches other recommendations in the literature, is that the essential part of effective nonparametric modeling is estimation of the smoothing parameters rather than the particular choice of nonparametric regression framework. Probably the most important contribution of these chapters is to provide a unified mixed-effect inferential framework for nonparametric regression methods that could seem unrelated. This methodological consolidation is necessary and timely. The models considered are clearly well suited for the datasets described in the book and, indeed, for many datasets that appear in practice. In fact, the full model in (1) may not even be necessary, but the authors do not discuss testing for model complexity. For example, it would be interesting to test whether subject deviations from the population mean, vi(·), could be replaced by simpler parametric components, such as fixed or random intercepts. A limitation of models considered in this book is that they are not designed to accommodate high between and within subject variability because they use only one smoothing parameter for all subject level functions. Moreover, the methods described may not scale well to datasets that are, say, 10 times larger. This is good news, of course, for other research-oriented statisticians working in this area. In Chapter 8, Wu and Zhang describe estimation methods for semiparametric models or models that contain both parametric and nonparametric components. The most complex model considered is yij = c ij α + η(tij ) + htij ai + vi(tij ) + ij ,