Measurement error in two‐stage analyses, with application to air pollution epidemiology

Public health researchers often estimate health effects of exposures (e.g., pollution, diet, lifestyle) that cannot be directly measured for study subjects. A common strategy in environmental epidemiology is to use a first-stage (exposure) model to estimate the exposure based on covariates and/or spatio-temporal proximity and to use predictions from the exposure model as the covariate of interest in the second-stage (health) model. This induces a complex form of measurement error. We propose an analytical framework and methodology that is robust to misspecification of the first-stage model and provides valid inference for the second-stage model parameter of interest. We decompose the measurement error into components analogous to classical and Berkson error and characterize properties of the estimator in the second-stage model if the first-stage model predictions are plugged in without correction. Specifically, we derive conditions for compatibility between the first- and second-stage models that guarantee consistency (and have direct and important real-world design implications), and we derive an asymptotic estimate of finite-sample bias when the compatibility conditions are satisfied. We propose a methodology that (1) corrects for finite-sample bias and (2) correctly estimates standard errors. We demonstrate the utility of our methodology in simulations and an example from air pollution epidemiology.

[1]  A. V. D. Vaart Asymptotic Statistics: Delta Method , 1998 .

[2]  Søren Højsgaard,et al.  The R Package geepack for Generalized Estimating Equations , 2005 .

[3]  Thomas Lumley,et al.  Model-Robust Regression and a Bayesian `Sandwich' Estimator , 2010, 1101.1402.

[4]  R. Graham Barr,et al.  Air Pollution and the Microvasculature: A Cross-Sectional Assessment of In Vivo Retinal Images in the Population-Based Multi-Ethnic Study of Atherosclerosis (MESA) , 2010, PLoS medicine.

[5]  Linda J Young,et al.  A comparison of errors in variables methods for use in regression models with spatially misaligned data , 2011, Statistical methods in medical research.

[6]  M. Jerrett,et al.  A distance-decay variable selection strategy for land use regression modeling of ambient air pollution exposures. , 2009, The Science of the total environment.

[7]  J Wakefield,et al.  Errors‐in‐Variables in Joint Population Pharmacokinetic/Pharmacodynamic Modeling , 2001, Biometrics.

[8]  Lianne Sheppard,et al.  Approach to estimating participant pollutant exposures in the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). , 2009, Environmental science & technology.

[9]  H. White A Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity , 1980 .

[10]  D. Stram,et al.  Correcting for exposure measurement error in a reanalysis of lung cancer mortality for the Colorado Plateau Uranium Miners cohort. , 1999, Health physics.

[11]  A. Buja,et al.  A Conspiracy of Random X and Model Violation against Classical Inference in Linear Regression , 2013 .

[12]  Shalabh Measurement Error: Models, Methods and Applications , 2011 .

[13]  Peter J. Diggle,et al.  Modelling spatio‐temporal variation in exposure to particulate matter: a two‐stage approach , 2008 .

[14]  David A. Bluemke,et al.  Abstract 059: Long-term Exposure to Oxides of Nitrogen and Left Ventricular Mass in the Multi-Ethnic Study of Atherosclerosis and Air Pollution , 2012 .

[15]  Sw. Banerjee,et al.  Hierarchical Modeling and Analysis for Spatial Data , 2003 .

[16]  Jon Wakefield,et al.  Health-exposure modeling and the ecological fallacy. , 2005, Biostatistics.

[17]  Gerald B. Folland,et al.  Real Analysis: Modern Techniques and Their Applications , 1984 .

[18]  M. Symons,et al.  Exposures and mortality among chrysotile asbestos workers. Part I: exposure estimates. , 1983, American journal of industrial medicine.

[19]  R. Burnett,et al.  Lung cancer, cardiopulmonary mortality, and long-term exposure to fine particulate air pollution. , 2002, JAMA.

[20]  R. Kronmal,et al.  Multi-Ethnic Study of Atherosclerosis: objectives and design. , 2002, American journal of epidemiology.

[21]  Andreas Buja,et al.  The Conspiracy of Random Predictors and Model Violations against Classical Inference in Regression , 2014 .

[22]  Soyoung Jeon,et al.  Measurement Error caused by Spatial Misalignment in Environmental Epidemiology , 2009 .

[23]  James S. Hodges,et al.  Richly Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects , 2013 .

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[25]  Annette Peters,et al.  Cardiopulmonary mortality and air pollution , 2002, The Lancet.

[26]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[27]  A. Gelman Analysis of variance: Why it is more important than ever? , 2005, math/0504499.

[28]  Andrew W. Roddam,et al.  Measurement Error in Nonlinear Models: a Modern Perspective , 2008 .

[29]  Michael Brauer,et al.  Long-Term Exposure to Traffic-Related Air Pollution and the Risk of Coronary Heart Disease Hospitalization and Mortality , 2010, Environmental health perspectives.

[30]  Donna Spiegelman,et al.  Approaches to uncertainty in exposure assessment in environmental epidemiology. , 2010, Annual review of public health.

[31]  Mike Rees,et al.  5. Statistics for Spatial Data , 1993 .

[32]  Thomas Lumley,et al.  Prospective study of particulate air pollution exposures, subclinical atherosclerosis, and clinical cardiovascular disease: The Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). , 2012, American journal of epidemiology.

[33]  Christopher J. Paciorek,et al.  Predicting Chronic Fine and Coarse Particulate Exposures Using Spatiotemporal Models for the Northeastern and Midwestern United States , 2008, Environmental health perspectives.

[34]  Thomas Lumley,et al.  Complex Surveys: A Guide to Analysis Using R , 2010 .

[35]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[36]  Samiran Sinha,et al.  Semiparametric Bayesian Analysis of Nutritional Epidemiology Data in the Presence of Measurement Error , 2010, Biometrics.

[37]  Joel Schwartz,et al.  Chronic Fine and Coarse Particulate Exposure, Mortality, and Coronary Heart Disease in the Nurses’ Health Study , 2008, Environmental health perspectives.

[38]  R. Slama,et al.  Traffic-Related Atmospheric Pollutants Levels during Pregnancy and Offspring’s Term Birth Weight: A Study Relying on a Land-Use Regression Exposure Model , 2007, Environmental health perspectives.

[39]  A. V. D. Vaart,et al.  Asymptotic Statistics: Frontmatter , 1998 .

[40]  B. Brunekreef,et al.  Air pollution and lung cancer incidence in 17 European cohorts: prospective analyses from the European Study of Cohorts for Air Pollution Effects (ESCAPE). , 2013, The Lancet. Oncology.

[41]  D. Bernstein,et al.  A Comparison of Proximity and Land Use Regression Traffic Exposure Models and Wheezing in Infants , 2006, Environmental health perspectives.

[42]  H. Kromhout,et al.  Modeling long-term average exposure in occupational exposure-response analysis. , 1995, Scandinavian journal of work, environment & health.

[43]  D. Dockery,et al.  An association between air pollution and mortality in six U.S. cities. , 1993, The New England journal of medicine.

[44]  J. Gulliver,et al.  A review of land-use regression models to assess spatial variation of outdoor air pollution , 2008 .

[45]  Lianne Sheppard,et al.  Does more accurate exposure prediction necessarily improve health effect estimates? , 2011, Epidemiology.

[46]  Adam Szpiro,et al.  Improving spatial concentration estimates for nitrogen oxides using a hybrid meteorological dispersion/land use regression model in Los Angeles, CA and Seattle, WA. , 2010, The Science of the total environment.

[47]  R. Burnett,et al.  Confounding and exposure measurement error in air pollution epidemiology , 2011, Air Quality, Atmosphere & Health.

[48]  Ho Kim,et al.  Health Effects of Long-term Air Pollution: Influence of Exposure Prediction Methods , 2009, Epidemiology.

[49]  L. Sheppard,et al.  Long-term exposure to air pollution and incidence of cardiovascular events in women. , 2007, The New England journal of medicine.

[50]  Karol Watson,et al.  Association of Long-term Air Pollution With Ventricular Conduction and Repolarization Abnormalities , 2011, Epidemiology.

[51]  F. Gilliland,et al.  Ambient Air Pollution and Atherosclerosis in Los Angeles , 2004, Environmental health perspectives.

[52]  Kiros Berhane,et al.  Residential Traffic-Related Pollution Exposures and Exhaled Nitric Oxide in the Children’s Health Study , 2011, Environmental health perspectives.

[53]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[54]  M. Wand,et al.  Semiparametric Regression: Parametric Regression , 2003 .

[55]  Lianne Sheppard,et al.  Exposure to traffic and left ventricular mass and function: the Multi-Ethnic Study of Atherosclerosis. , 2009, American journal of respiratory and critical care medicine.

[56]  R. Burnett,et al.  Spatial Analysis of Air Pollution and Mortality in Los Angeles , 2005, Epidemiology.

[57]  D. Dockery,et al.  Health Effects of Fine Particulate Air Pollution: Lines that Connect , 2006, Journal of the Air & Waste Management Association.

[58]  Beat Neuenschwander,et al.  Combining MCMC with ‘sequential’ PKPD modelling , 2009, Journal of Pharmacokinetics and Pharmacodynamics.

[59]  J. Hodges,et al.  Adding Spatially-Correlated Errors Can Mess Up the Fixed Effect You Love , 2010 .

[60]  Ross L. Prentice,et al.  Chronic Disease Prevention Research Methods and Their Reliability, With Illustrations From the Women’s Health Initiative , 2010 .

[61]  Thomas Lumley,et al.  Predicting intra‐urban variation in air pollution concentrations with complex spatio‐temporal dependencies , 2009, Environmetrics.

[62]  Noel A Cressie,et al.  Statistics for Spatial Data. , 1992 .

[63]  Altaf Arain,et al.  A review and evaluation of intraurban air pollution exposure models , 2005, Journal of Exposure Analysis and Environmental Epidemiology.

[64]  Christopher J Paciorek,et al.  Bayesian Smoothing with Gaussian Processes Using Fourier Basis Functions in the spectralGP Package. , 2007, Journal of statistical software.

[65]  Lianne Sheppard,et al.  Efficient measurement error correction with spatially misaligned data. , 2011, Biostatistics.

[66]  David Ruppert,et al.  Regression with spatially misaligned data , 2008 .

[67]  M. Brauer How much, how long, what, and where: air pollution exposure assessment for epidemiologic studies of respiratory disease. , 2010, Proceedings of the American Thoracic Society.