Evaluating and Enhancing the Generalization Performance of Machine Learning Models for Physical Activity Intensity Prediction From Raw Acceleration Data

Purpose: To evaluate and enhance the generalization performance of machine learning physical activity intensity prediction models developed with raw acceleration data on populations monitored by different activity monitors. Method: Five datasets from four studies, each containing only hip- or wrist-based raw acceleration data (two hip- and three wrist-based) were extracted. The five datasets were then used to develop and validate artificial neural networks (ANN) in three setups to classify activity intensity categories (sedentary behavior, light, and moderate-to-vigorous). To examine generalizability, the ANN models were developed using within dataset (leave-one-subject-out) cross validation, and then cross tested to other datasets with different accelerometers. To enhance the models’ generalizability, a combination of four of the five datasets was used for training and the fifth dataset for validation. Finally, all the five datasets were merged to develop a single model that is generalizable across the datasets (50% of the subjects from each dataset for training, the remaining for validation). Results: The datasets showed high performance in within dataset cross validation (accuracy 71.9–95.4%, Kappa K = 0.63–0.94). The performance of the within dataset validated models decreased when applied to datasets with different accelerometers (41.2–59.9%, K = 0.21–0.48). The trained models on merged datasets consisting hip and wrist data predicted the left-out dataset with acceptable performance (65.9–83.7%, K = 0.61–0.79). The model trained with all five datasets performed with acceptable performance across the datasets (80.4–90.7%, K = 0.68–0.89). Conclusions: Integrating heterogeneous datasets in training sets seems a viable approach for enhancing the generalization performance of the models. Instead, within dataset validation is not sufficient to understand the models’ performance on other populations with different accelerometers.

[1]  C. Matthews,et al.  Best practices for using physical activity monitors in population-based research. , 2012, Medicine and science in sports and exercise.

[2]  Gert R. G. Lanckriet,et al.  A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers , 2014, Physiological measurement.

[3]  Scott J Strath,et al.  Accelerometer use with children, older adults, and adults with functional limitations. , 2012, Medicine and science in sports and exercise.

[4]  Karin A Pfeiffer,et al.  Energy Expenditure Prediction Using Raw Accelerometer Data in Simulated Free Living. , 2015, Medicine and science in sports and exercise.

[5]  Pierre Jallon,et al.  Automatic identification of physical activity types and sedentary behaviors from triaxial accelerometer: laboratory-based calibrations are not enough. , 2015, Journal of applied physiology.

[6]  Bradford S. Westgate,et al.  Cross-validation and out-of-sample testing of physical activity intensity predictions with a wrist-worn accelerometer. , 2018, Journal of applied physiology.

[7]  James M. Pivarnik,et al.  Comparison of Activity Type Classification Accuracy from Accelerometers Worn on the Hip, Wrists, and Thigh in Young, Apparently Healthy Adults , 2016 .

[8]  Patty S. Freedson,et al.  Comparison of Raw Acceleration from the GENEA and ActiGraph™ GT3X+ Activity Monitors , 2013, Sensors.

[9]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[10]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[11]  Weng-Keen Wong,et al.  Artificial neural networks to predict activity type and energy expenditure in youth. , 2012, Medicine and science in sports and exercise.

[12]  J. Staudenmayer,et al.  Development of novel techniques to classify physical activity mode using accelerometers. , 2006, Medicine and science in sports and exercise.

[13]  John Staudenmayer,et al.  Statistical considerations in the analysis of accelerometry-based activity monitor data. , 2012, Medicine and science in sports and exercise.

[14]  Gert R. G. Lanckriet,et al.  Hip and Wrist Accelerometer Algorithms for Free-Living Behavior Classification. , 2016, Medicine and science in sports and exercise.

[15]  J. Sallis,et al.  Using accelerometers in youth physical activity studies: a review of methods. , 2013, Journal of physical activity & health.

[16]  Dinesh John,et al.  Performance of Activity Classification Algorithms in Free-Living Older Adults. , 2016, Medicine and science in sports and exercise.

[17]  Laura D. Ellingson,et al.  Sedentary Behavior Research Network (SBRN) – Terminology Consensus Project process and outcome , 2017, International Journal of Behavioral Nutrition and Physical Activity.

[18]  Andrea Mannini,et al.  Activity Recognition in Youth Using Single Accelerometer Placed at Wrist or Ankle , 2017, Medicine and science in sports and exercise.

[19]  Billur Barshan,et al.  Comparative study on classifying human activities with miniature inertial and magnetic sensors , 2010, Pattern Recognit..

[20]  S. Intille,et al.  Estimating activity and sedentary behavior from an accelerometer on the hip or wrist. , 2013, Medicine and science in sports and exercise.

[21]  Stewart G Trost,et al.  Sensor-enabled Activity Class Recognition in Preschoolers: Hip versus Wrist Data , 2017, Medicine and science in sports and exercise.

[22]  Maarit Kangas,et al.  Calibration and validation of accelerometer-based activity monitors: A systematic review of machine-learning approaches. , 2019, Gait & posture.

[23]  Gregory J Welk,et al.  Validity of an Integrative Method for Processing Physical Activity Data. , 2016, Medicine and science in sports and exercise.

[24]  David R Bassett,et al.  2011 Compendium of Physical Activities: a second update of codes and MET values. , 2011, Medicine and science in sports and exercise.

[25]  PATTY S. FREEDSON,et al.  Utilization and Harmonization of Adult Accelerometry Data: Review and Expert Consensus , 2015, Medicine and science in sports and exercise.

[26]  James M. Pivarnik,et al.  Validation and Comparison of Accelerometers Worn on the Hip, Thigh, and Wrists for Measuring Physical Activity and Sedentary Behavior , 2016, AIMS public health.

[27]  DAVID BERRIGAN,et al.  A Youth Compendium of Physical Activities: Activity Codes and Metabolic Intensities , 2017, Medicine and science in sports and exercise.

[28]  K. Pfeiffer,et al.  Raw and Count Data Comparability of Hip-Worn ActiGraph GT3X+ and Link Accelerometers , 2017, Medicine and science in sports and exercise.

[29]  Rohit J. Kate,et al.  Comparative evaluation of features and techniques for identifying activity type and estimating energy cost from accelerometer data , 2016, Physiological measurement.

[30]  J. Staudenmayer,et al.  Methods to estimate aspects of physical activity and sedentary behavior from high-frequency wrist accelerometer measurements. , 2015, Journal of applied physiology.

[31]  Patty Freedson,et al.  Calibration of accelerometer output for children. , 2005, Medicine and science in sports and exercise.

[32]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[33]  Weng-Keen Wong,et al.  Machine learning for activity recognition: hip versus wrist data , 2014, Physiological measurement.

[34]  Ulf Ekelund,et al.  Age group comparability of raw accelerometer output from wrist- and hip-worn monitors. , 2014, Medicine and science in sports and exercise.

[35]  Tina L Hurst,et al.  Physical activity classification using the GENEA wrist-worn accelerometer. , 2012, Medicine and science in sports and exercise.

[36]  John Staudenmayer,et al.  Evaluation of artificial neural network algorithms for predicting METs and activity type from accelerometer data: validation on an independent sample. , 2011, Journal of applied physiology.

[37]  Thomas Fang Zheng,et al.  Noisy training for deep neural networks in speech recognition , 2015, EURASIP Journal on Audio, Speech, and Music Processing.

[38]  Kate Ridley,et al.  Assigning energy costs to activities in children: a review and synthesis. , 2008, Medicine and science in sports and exercise.

[39]  E. Tomasi,et al.  Calibration of raw accelerometer data to measure physical activity: A systematic review. , 2018, Gait & posture.

[40]  W. Marsden I and J , 2012 .

[41]  David R Bassett,et al.  Calibration and validation of wearable monitors. , 2012, Medicine and science in sports and exercise.

[42]  R. Eston,et al.  Activity classification using the GENEA: optimum sampling frequency and number of axes. , 2012, Medicine and science in sports and exercise.

[43]  Roger G Eston,et al.  Comparability of measured acceleration from accelerometry-based activity monitors. , 2015, Medicine and science in sports and exercise.

[44]  Maarit Kangas,et al.  Classification of physical activities and sedentary behavior using raw data of 3D hip acceleration , 2017 .

[45]  Mary T. Imboden,et al.  Validation of Accelerometer-Based Energy Expenditure Prediction Models in Structured and Simulated Free-Living Settings , 2017 .