Missing value imputation for physical activity data measured by accelerometer

An accelerometer, a wearable motion sensor on the hip or wrist, is becoming a popular tool in clinical and epidemiological studies for measuring the physical activity. Such data provide a series of activity counts at every minute or even more often and displays a person’s activity pattern throughout a day. Unfortunately, the collected data can include irregular missing intervals because of noncompliance of participants and therefore make the statistical analysis more challenging. The purpose of this study is to develop a novel imputation method to handle the multivariate count data, motivated by the accelerometer data structure. We specify the predictive distribution of the missing data with a mixture of zero-inflated Poisson and Log-normal distribution, which is shown to be effective to deal with the minute-by-minute autocorrelation as well as under- and over-dispersion of count data. The imputation is performed at the minute level and follows the principles of multiple imputation using a fully conditional specification with the chained algorithm. To facilitate the practical use of this method, we provide an R package accelmissing. Our method is demonstrated using 2003−2004 National Health and Nutrition Examination Survey data.

[1]  P. Holgate Estimation for the bivariate Poisson distribution , 1964 .

[2]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[3]  Song Yang,et al.  Imputation of missing data when measuring physical activity by accelerometry. , 2005, Medicine and science in sports and exercise.

[4]  Jost Reinecke,et al.  Multiple imputation of incomplete zero‐inflated count data , 2013 .

[5]  D. Rubin Multiple Imputation After 18+ Years , 1996 .

[6]  A. Gelman Parameterization and Bayesian Modeling , 2004 .

[7]  Johannes Brug,et al.  From Sedentary Time to Sedentary Patterns: Accelerometer Data Reduction Decisions in Youth , 2014, PloS one.

[8]  Donald B. Rubin,et al.  Statistical Matching Using File Concatenation With Adjusted Weights and Multiple Imputations , 1986 .

[9]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[10]  Paul Damien,et al.  A multivariate Poisson-lognormal regression model for prediction of crash counts by severity, using Bayesian methods. , 2008, Accident; analysis and prevention.

[11]  Paul Yip,et al.  INFERENCE ABOUT THE MEAN OF A POISSON DISTRIBUTION IN THE PRESENCE OF A NUISANCE PARAMETER , 1988 .

[12]  Spencer Graves,et al.  Functional Data Analysis with R and MATLAB , 2009 .

[13]  Jean Francois Walhin,et al.  Bivariate ZIP models , 2001 .

[14]  C. Matthews,et al.  Identifying sedentary time using automated estimates of accelerometer wear time , 2011, British Journal of Sports Medicine.

[15]  Irina Bondarenko,et al.  Diagnostics for Multiple Imputations , 2007 .

[16]  P. Robinson,et al.  Estimation of Time Series Models in the Presence of Missing Data , 1981 .

[17]  J. Jobe,et al.  Promoting physical activity in middle school girls: Trial of Activity for Adolescent Girls. , 2008, American journal of preventive medicine.

[18]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[19]  Jeff Gill,et al.  We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data , 2012, British Journal of Political Science.

[20]  Hirokazu Yanagihara,et al.  Testing the equality of several covariance matrices with fewer observations than the dimension , 2010, J. Multivar. Anal..

[21]  Bernard F Fuemmeler,et al.  Accelerometer data reduction: a comparison of four reduction algorithms on select outcome variables. , 2005, Medicine and science in sports and exercise.

[22]  Leena Choi,et al.  Validation of accelerometer wear and nonwear time classification algorithm. , 2011, Medicine and science in sports and exercise.

[23]  K. Evenson Towards an Understanding of Change in Physical Activity from Pregnancy Through Postpartum. , 2011, Psychology of sport and exercise.

[24]  U. Ekelund,et al.  Assessing Physical Activity Among Children With Accelerometers Using Different Time Sampling Intervals and Placements , 2002 .

[25]  S. Going,et al.  Age-related change in physical activity in adolescent girls. , 2009, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[26]  Ingram Olkin,et al.  A Family of Bivariate Distributions Generated by the Bivariate Bernoulli Distribution , 1985 .

[27]  K. El-Basyouny,et al.  Collision prediction models using multivariate Poisson-lognormal regression. , 2009, Accident; analysis and prevention.

[28]  B E Ainsworth,et al.  A simultaneous evaluation of 10 commonly used physical activity questionnaires. , 1993, Medicine and science in sports and exercise.

[29]  Jye-Chyi Lu,et al.  Multivariate zero-inflated Poisson models and their applications , 1999 .

[30]  Paul H Lee Data imputation for accelerometer-measured physical activity: the combined approach. , 2013, The American journal of clinical nutrition.

[31]  Roderick J A Little,et al.  A Review of Hot Deck Imputation for Survey Non‐response , 2010, International statistical review = Revue internationale de statistique.

[32]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[33]  Roger A. Sugden,et al.  Multiple Imputation for Nonresponse in Surveys , 1988 .

[34]  B E Ainsworth,et al.  The recall of physical activity: using a cognitive model of the question-answering process. , 1996, Medicine and science in sports and exercise.

[35]  John M. Olin Markov Chain Monte Carlo Analysis of Correlated Count Data , 2003 .

[36]  Russell V. Lenth,et al.  Statistical Analysis With Missing Data (2nd ed.) (Book) , 2004 .

[37]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[38]  G. King,et al.  What to Do about Missing Values in Time‐Series Cross‐Section Data , 2010 .

[39]  Ian R White,et al.  Evaluation of two-fold fully conditional specification multiple imputation for longitudinal electronic health record data , 2014, Statistics in medicine.

[40]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[41]  W. Rejeski,et al.  Effect of Varying Accelerometry Criteria on Physical Activity: The Look AHEAD Study , 2012, Obesity.

[42]  Diane J Catellier,et al.  Design of the Trial of Activity in Adolescent Girls (TAAG). , 2005, Contemporary clinical trials.

[43]  H. Akaike A new look at the statistical model identification , 1974 .

[44]  R. Little Missing-Data Adjustments in Large Surveys , 1988 .

[45]  Richard H. Jones,et al.  Maximum Likelihood Fitting of ARMA Models to Time Series With Missing Observations , 1980 .

[46]  Performance of the ActiGraph accelerometer using a national population-based sample of youth and adults , 2015, BMC Research Notes.

[47]  Graham A Colditz,et al.  Youth recall and TriTrac accelerometer estimates of physical activity levels. , 2004, Medicine and science in sports and exercise.

[48]  Stewart G Trost,et al.  Conducting accelerometer-based activity assessments in field-based research. , 2005, Medicine and science in sports and exercise.

[49]  R. Shumway,et al.  AN APPROACH TO TIME SERIES SMOOTHING AND FORECASTING USING THE EM ALGORITHM , 1982 .

[50]  Jeffrey S. Morris,et al.  Journal of the American Statistical Association Using Wavelet-based Functional Mixed Models to Characterize Population Heterogeneity in Accelerometer Profiles Using Wavelet-based Functional Mixed Models to Characterize Population Heterogeneity in Accelerometer Profiles: a Case Study , 2022 .

[51]  Kelly R Evenson,et al.  Public Parks and Physical Activity Among Adolescent Girls , 2006, Pediatrics.

[52]  Kelly R Evenson,et al.  Patterns of objectively measured physical activity in the United States. , 2008, Medicine and science in sports and exercise.

[53]  Jaakko Nevalainen,et al.  Missing values in longitudinal dietary data: A multiple imputation approach based on a fully conditional specification , 2009, Statistics in medicine.

[54]  Joseph L Schafer,et al.  Analysis of Incomplete Multivariate Data , 1997 .

[55]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[56]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[57]  A. Gelman,et al.  Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box , 2011 .

[58]  L. Mâsse,et al.  Physical activity in the United States measured by accelerometer. , 2008, Medicine and science in sports and exercise.

[59]  Jeongyoun Ahn,et al.  Covariance adjustment for batch effect in gene expression data , 2014, Statistics in medicine.

[60]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[61]  Ilse de Bourdeaudhuij,et al.  Occurrence and duration of various operational definitions of sedentary bouts and cross-sectional associations with cardiometabolic health indicators: the ENERGY-project. , 2015, Preventive medicine.