Generalized additive models for cancer mapping with incomplete covariates.

Maps depicting cancer incidence rates have become useful tools in public health research, giving valuable information about the spatial variation in rates of disease. Typically, these maps are generated using count data aggregated over areas such as counties or census blocks. However, with the proliferation of geographic information systems and related databases, it is becoming easier to obtain exact spatial locations for the cancer cases and suitable control subjects. The use of such point data allows us to adjust for individual-level covariates, such as age and smoking status, when estimating the spatial variation in disease risk. Unfortunately, such covariate information is often subject to missingness. We propose a method for mapping cancer risk when covariates are not completely observed. We model these data using a logistic generalized additive model. Estimates of the linear and non-linear effects are obtained using a mixed effects model representation. We develop an EM algorithm to account for missing data and the random effects. Since the expectation step involves an intractable integral, we estimate the E-step with a Laplace approximation. This framework provides a general method for handling missing covariate values when fitting generalized additive models. We illustrate our method through an analysis of cancer incidence data from Cape Cod, Massachusetts. These analyses demonstrate that standard complete-case methods can yield biased estimates of the spatial variation of cancer risk.

[1]  M. Plummer,et al.  International agency for research on cancer. , 2020, Archives of pathology.

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  R. Munn,et al.  The Design of Air Quality Monitoring Networks , 1981 .

[4]  T. Louis Finding the Observed Information Matrix When Using the EM Algorithm , 1982 .

[5]  D. Rubin,et al.  Statistical Analysis with Missing Data. , 1989 .

[6]  Chris Chatfield,et al.  19. Statistical Analysis with Missing Data , 1988 .

[7]  L. Tierney,et al.  Fully Exponential Laplace Approximations to Expectations and Variances of Nonpositive Functions , 1989 .

[8]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[9]  G. Wahba Spline models for observational data , 1990 .

[10]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[11]  M. E. Johnson,et al.  Minimax and maximin distance designs , 1990 .

[12]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[13]  N. Cressie,et al.  Statistics for Spatial Data. , 1992 .

[14]  Robert Haining,et al.  Statistics for spatial data: by Noel Cressie, 1991, John Wiley & Sons, New York, 900 p., ISBN 0-471-84336-9, US $89.95 , 1993 .

[15]  N. Breslow,et al.  Approximate inference in generalized linear mixed models , 1993 .

[16]  B. Steele,et al.  A modified EM algorithm for estimation in generalized mixed models. , 1996, Biometrics.

[17]  Stephen G Walker,et al.  AN EM ALGORITHM FOR NONLINEAR RANDOM EFFECTS MODELS , 1996 .

[18]  Michael P. Jones Indicator and stratification methods for missing explanatory variables in multiple linear regression , 1996 .

[19]  Joseph G. Ibrahim,et al.  A conditional model for incomplete covariates in parametric regression models , 1996 .

[20]  W. Härdle,et al.  Estimation of additive regression models with known links , 1996 .

[21]  J. Schwartz,et al.  Air Pollution and Hospital Admissions for Cardiovascular Disease in Tucson , 1997, Epidemiology.

[22]  C. McCulloch Maximum Likelihood Algorithms for Generalized Linear Mixed Models , 1997 .

[23]  J. Rice,et al.  Smoothing spline models for the analysis of nested and crossed samples of curves , 1998 .

[24]  Yuedong Wang Smoothing Spline Models with Correlated Random Errors , 1998 .

[25]  Nathaniel N. Beck,et al.  Beyond linearity by default: Generalized additive models , 1998 .

[26]  Douglas W. Nychka,et al.  Design of Air-Quality Monitoring Networks , 1998 .

[27]  T. C. Haas,et al.  Model-based geostatistics. Discussion. Authors' reply , 1998 .

[28]  J G Ibrahim,et al.  Monte Carlo EM for Missing Covariates in Parametric Regression Models , 1999, Biometrics.

[29]  Steven G. Gilmour,et al.  The analysis of designed experiments and longitudinal data by using smoothing splines - Discussion , 1999 .

[30]  Paul L. Speckman,et al.  A model for predicting maximum and 8 h average ozone in Houston , 1999 .

[31]  J. Booth,et al.  Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm , 1999 .

[32]  Joseph G. Ibrahim,et al.  Missing covariates in generalized linear models when the missing data mechanism is non‐ignorable , 1999 .

[33]  X. Lin,et al.  Inference in generalized additive mixed modelsby using smoothing splines , 1999 .

[34]  R. Biggar,et al.  Zoster incidence in human immunodeficiency virus-infected hemophiliacs and homosexual men, 1984-1997. District of Columbia Gay Cohort Study. Multicenter Hemophilia Cohort Study. , 1999, The Journal of infectious diseases.

[35]  Jon Wakefield,et al.  Accounting for inaccuracies in population counts and case registration in cancer mapping studies , 1999 .

[36]  M. Stein Statistical Interpolation of Spatial Data: Some Theory for Kriging , 1999 .

[37]  M. Wand,et al.  Comment on Shively , Kohn and WoodBabette , 1999 .

[38]  J. Roland,et al.  ALPINE PARNASSIUS BUTTERFLY DISPERSAL: EFFECTS OF LANDSCAPE AND POPULATION SIZE , 2000 .

[39]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[40]  R. Jennrich,et al.  Standard errors for EM estimation , 2000 .

[41]  P. Simpson,et al.  Statistical methods in cancer research , 2001, Journal of surgical oncology.

[42]  S. Lipsitz,et al.  Missing responses in generalised linear mixed models when the missing data mechanism is nonignorable , 2001 .

[43]  P. Diggle,et al.  Spatial variation in risk of disease: a nonparametric binary regression approach , 2002 .

[44]  M. Wand,et al.  Geoadditive models , 2003 .