At-risk-measure Sampling in Case–Control Studies with Aggregated Data

Supplemental Digital Content is available in the text. Transient exposures are difficult to measure in epidemiologic studies, especially when both the status of being at risk for an outcome and the exposure change over time and space, as when measuring built-environment risk on transportation injury. Contemporary “big data” generated by mobile sensors can improve measurement of transient exposures. Exposure information generated by these devices typically only samples the experience of the target cohort, so a case-control framework may be useful. However, for anonymity, the data may not be available by individual, precluding a case–crossover approach. We present a method called at-risk-measure sampling. Its goal is to estimate the denominator of an incidence rate ratio (exposed to unexposed measure of the at-risk experience) given an aggregated summary of the at-risk measure from a cohort. Rather than sampling individuals or locations, the method samples the measure of the at-risk experience. Specifically, the method as presented samples person–distance and person–events summarized by location. It is illustrated with data from a mobile app used to record bicycling. The method extends an established case–control sampling principle: sample the at-risk experience of a cohort study such that the sampled exposure distribution approximates that of the cohort. It is distinct from density sampling in that the sample remains in the form of the at-risk measure, which may be continuous, such as person–time or person–distance. This aspect may be both logistically and statistically efficient if such a sample is already available, for example from big-data sources like aggregated mobile-sensor data.

[1]  K. Steenland,et al.  Selecting an Exposure Lag Period , 1995, Epidemiology.

[2]  Shuyan Chen,et al.  Analysis on spatiotemporal urban mobility based on online car-hailing data , 2020 .

[3]  M. Maclure The case-crossover design: a method for studying transient effects on the risk of acute events. , 1991, American journal of epidemiology.

[4]  Mark Birkin,et al.  Identifying Methods for Monitoring Foodborne Illness: Review of Existing Public Health Surveillance Techniques , 2018, JMIR public health and surveillance.

[5]  Yongtao Guan,et al.  Estimating Individual-Level Risk in Spatial Epidemiology Using Spatially Aggregated Information on the Population at Risk , 2010, Journal of the American Statistical Association.

[6]  T Sato,et al.  Risk ratio estimation in case-cohort studies. , 1994, Environmental health perspectives.

[7]  J. Gulliver,et al.  Cycling injury risk in London: A case-control study exploring the impact of cycle volumes, motor vehicle volumes, and road characteristics including speed limits , 2018, Accident; analysis and prevention.

[8]  Birthe Uldahl Kjeldsen,et al.  Contribution of various microenvironments to the daily personal exposure to ultrafine particles: Personal monitoring coupled with GPS tracking , 2015 .

[9]  D. Karssenberg,et al.  Activity-based air pollution exposure assessment: Differences between homemakers and cycling commuters. , 2019, Health & place.

[10]  A. Hoes Case-control studies. , 1995, The Netherlands journal of medicine.

[11]  Richard F MacLehose,et al.  Good practices for quantitative bias analysis. , 2014, International journal of epidemiology.

[12]  Christopher N. Morrison,et al.  Assessing Individuals’ Exposure to Environmental Conditions Using Residence-based Measures, Activity Location–based Measures, and Activity Path–based Measures , 2019, Epidemiology.

[13]  C. Signorelli,et al.  Measures of occurrence , 2010 .

[14]  Basile Chaix,et al.  Mobile Sensing in Environmental Health and Neighborhood Research. , 2018, Annual review of public health.

[15]  Peter A Cripton,et al.  Route infrastructure and the risk of injuries to bicyclists: a case-crossover study. , 2012, American journal of public health.

[16]  P. Galpern,et al.  Crowdsourcing (in) Voluntary Citizen Geospatial Data from Google Android Smartphones , 2018 .

[17]  J. Lessler,et al.  The use of GPS data loggers to describe the impact of spatio-temporal movement patterns on malaria control in a high-transmission area of northern Zambia , 2019, International Journal of Health Geographics.

[18]  G. Bedogni Applying Quantitative Bias Analysis to Epidemiologic Data , 2011 .

[19]  I. Roberts,et al.  The urban traffic environment and the risk of child pedestrian injury: a case-crossover approach. , 1995, Epidemiology.

[20]  B. Efron,et al.  Bootstrap confidence intervals , 1996 .

[21]  Marizen R. Ramirez,et al.  A GIS-based Matched Case–control Study of Road Characteristics in Farm Vehicle Crashes , 2016, Epidemiology.

[22]  Romain Meeusen,et al.  Exposure measurement in bicycle safety analysis: A review of the literature. , 2015, Accident; analysis and prevention.

[23]  Michael Branion-Calles,et al.  Cycling safety: Quantifying the under reporting of cycling incidents in Vancouver, British Columbia , 2017 .

[24]  S Suissa,et al.  THE CASE‐TIME-CONTROL DESIGN , 1995, Epidemiology.

[25]  Ipek N Sener,et al.  Understanding Potential Exposure of Bicyclists on Roadways to Traffic-Related Air Pollution: Findings from El Paso, Texas, Using Strava Metro Data , 2019, International journal of environmental research and public health.

[26]  S Greenland,et al.  On the need for the rare disease assumption in case-control studies. , 1982, American journal of epidemiology.

[27]  F. Momen-Heravi,et al.  Selection bias. , 2015, Journal of the American Dental Association.

[28]  Trisalyn A. Nelson,et al.  Where to put bike counters? Stratifying bicycling patterns in the city using crowdsourced data , 2019, Transport Findings.

[29]  Stephen J. Mooney,et al.  Sampling and Sampling Frames in Big Data Epidemiology , 2019, Current Epidemiology Reports.

[30]  E. Stuart,et al.  Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies , 2015, Statistics in medicine.

[31]  Xuehao Chu A Guidebook for Using Automatic Passenger Counter Data for National Transit Database (NTD) Reporting , 2010 .

[32]  F. Dominici,et al.  Aggregated mobility data could help fight COVID-19 , 2020, Science.

[33]  Bradley D Schultz,et al.  GPS-based microenvironment tracker (MicroTrac) model to estimate time–location of individuals for air pollution exposure assessments: Model evaluation in central North Carolina , 2014, Journal of Exposure Science and Environmental Epidemiology.

[34]  Marco De Nadai,et al.  Mobile phone data for informing public health actions across the COVID-19 pandemic life cycle , 2020, Science Advances.

[35]  Sebastien J-P A Haneuse,et al.  The Combination of Ecological and Case-Control Data. , 2006, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[36]  O. Miettinen,et al.  Estimability and estimation in case-referent studies. , 1976, American journal of epidemiology.

[37]  O S Miettinen,et al.  Etiologic research: needed revisions of concepts and principles. , 1999, Scandinavian journal of work, environment & health.

[38]  R A Waugh,et al.  Effects of mental stress on myocardial ischemia during daily life. , 1997, JAMA.

[39]  M Maclure,et al.  Case–crossover and case–time–control designs as alternatives in pharmacoepidemiologic research , 1997, Pharmacoepidemiology and drug safety.

[40]  S. Wall,et al.  Spatial analysis of the association of alcohol outlets and alcohol-related pedestrian/bicyclist injuries in New York City , 2016, Injury Epidemiology.

[41]  Michael D. Garber,et al.  Comparing bicyclists who use smartphone apps to record rides with those who do not: implications for representativeness and selection bias. , 2019, Journal of transport & health.

[42]  M Maclure,et al.  Should we use a case-crossover design? , 2000, Annual review of public health.

[43]  Trisalyn A. Nelson,et al.  Mapping ridership using crowdsourced cycling data , 2016 .

[44]  S Wacholder,et al.  The Case‐Control Study as Data Missing by Design: Estimating Risk Differences , 1996, Epidemiology.

[45]  Xiaomei Ma,et al.  Disease risk estimation by combining case–control data with aggregated information on the population at risk , 2015, Biometrics.

[46]  Lawrence L. Kupper,et al.  A Hybrid Epidemiologic Study Design Useful in Estimating Relative Risk , 1975 .

[47]  S. Suissa,et al.  The Multitime Case-control Design for Time-varying Exposures , 2010, Epidemiology.

[48]  S. Richardson,et al.  Adjusting for selection bias in retrospective, case-control studies. , 2008, Biostatistics.

[49]  W D Flanders,et al.  Estimation of risk ratios in case-base studies with competing risks. , 1990, Statistics in medicine.

[50]  S Haneuse,et al.  Geographic‐based ecological correlation studies using supplemental case–control data , 2008, Statistics in medicine.

[51]  Miguel A Hernán,et al.  The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data , 2018, American journal of public health.

[52]  Jack T Dennerlein,et al.  Risk of injury for bicycling on cycle tracks versus in the street , 2011, Injury Prevention.

[53]  T. Nelson,et al.  Correcting Bias in Crowdsourced Data to Map Bicycle Ridership of All Bicyclists , 2019, Urban Science.

[54]  P. Allison,et al.  Mapping Activity Patterns to Quantify Risk of Violent Assault in Urban Environments , 2015, Epidemiology.

[55]  Michael Lowry,et al.  Collecting Network-wide Bicycle and Pedestrian Data: A Guidebook for When and Where to Count , 2017 .

[56]  S. Tin Tin,et al.  Temporal, seasonal and weather effects on cycle volume: an ecological study , 2012, Environmental Health.

[57]  Meghan Winters,et al.  Comparing the effects of infrastructure on bicycling injury at intersections and non-intersections using a case–crossover design , 2013, Injury Prevention.

[58]  Bumjoon Kang,et al.  Walking objectively measured: classifying accelerometer data with GPS and travel diaries. , 2013, Medicine and science in sports and exercise.

[59]  S. Suissa The Quasi-cohort approach in pharmacoepidemiology: upgrading the nested case-control. , 2015, Epidemiology.

[60]  S. Haneuse,et al.  Adjustment for Selection Bias in Observational Studies with Application to the Analysis of Autopsy Data , 2009, Neuroepidemiology.