Complex Deep Neural Networks from Large Scale Virtual IMU Data for Effective Human Activity Recognition Using Wearables

Supervised training of human activity recognition (HAR) systems based on body-worn inertial measurement units (IMUs) is often constrained by the typically rather small amounts of labeled sample data. Systems like IMUTube have been introduced that employ cross-modality transfer approaches to convert videos of activities of interest into virtual IMU data. We demonstrate for the first time how such large-scale virtual IMU datasets can be used to train HAR systems that are substantially more complex than the state-of-the-art. Complexity is thereby represented by the number of model parameters that can be trained robustly. Our models contain components that are dedicated to capture the essentials of IMU data as they are of relevance for activity recognition, which increased the number of trainable parameters by a factor of 1100 compared to state-of-the-art model architectures. We evaluate the new model architecture on the challenging task of analyzing free-weight gym exercises, specifically on classifying 13 dumbbell execises. We have collected around 41 h of virtual IMU data using IMUTube from exercise videos available from YouTube. The proposed model is trained with the large amount of virtual IMU data and calibrated with a mere 36 min of real IMU data. The trained model was evaluated on a real IMU dataset and we demonstrate the substantial performance improvements of 20% absolute F1 score compared to the state-of-the-art convolutional models in HAR.

[1]  G. Abowd,et al.  IMUTube , 2020 .

[2]  Gregory D. Abowd,et al.  Automatic Synchronization of Wearable Sensors and Video-Cameras for Ground Truth Annotation -- A Practical Approach , 2012, 2012 16th International Symposium on Wearable Computers.

[3]  Dacheng Tao,et al.  Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing , 2019, AAAI.

[4]  Feiyue Huang,et al.  Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Cholmin Kang,et al.  Towards Machine Learning with Zero Real-World Data , 2019, WearSys@MobiSys.

[6]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[7]  Yiqiang Chen,et al.  OCEAN: a new opportunistic computing model for wearable activity recognition , 2016, UbiComp Adjunct.

[8]  Ricardo Chavarriaga,et al.  The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition , 2013, Pattern Recognit. Lett..

[9]  T. Abdelzaher,et al.  SenseGAN: Enabling Deep Learning for Internet of Things with a Semi-Supervised Framework , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[10]  M. N. Nyan,et al.  Classification of gait patterns in the time-frequency domain. , 2006, Journal of biomechanics.

[11]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[12]  Joel A. Hesch,et al.  A Direct Least-Squares (DLS) method for PnP , 2011, 2011 International Conference on Computer Vision.

[13]  Jeffrey M. Hausdorff,et al.  Wearable Assistant for Parkinson’s Disease Patients With the Freezing of Gait Symptom , 2010, IEEE Transactions on Information Technology in Biomedicine.

[14]  Daniel Oñoro-Rubio,et al.  Contextual Hourglass Networks for Segmentation and Density Estimation , 2018, ArXiv.

[15]  Johan Lukkien,et al.  Multi-task Self-Supervised Learning for Human Activity Detection , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Irfan Essa,et al.  Contrastive Predictive Coding for Human Activity Recognition , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[18]  Youzuo Lin,et al.  Contextual Hourglass Network for Semantic Segmentation of High Resolution Aerial Imagery , 2018 .

[19]  Mahanth Gowda,et al.  When Video meets Inertial Sensors: Zero-shot Domain Adaptation for Finger Motion Analytics with Inertial Sensors , 2021, IoTDI.

[20]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[21]  C. Tudor-Locke,et al.  How Many Steps/Day Are Enough? , 2004, Sports medicine.

[22]  Vladlen Koltun,et al.  Colored Point Cloud Registration Revisited , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Gregory D. Abowd,et al.  Handling annotation uncertainty in human activity recognition , 2019, UbiComp.

[26]  Gregory D. Abowd,et al.  IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[27]  Ardhendu Behera,et al.  Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation , 2020, ECCV.

[28]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Philipp Scholl,et al.  Wearables in the wet lab: a laboratory system for capturing and guiding experiments , 2015, UbiComp.

[30]  Paul L. Rosin,et al.  Pose2Seg: Detection Free Human Instance Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  EarEcho , 2019, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies.

[33]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[34]  Andrea Cavallaro,et al.  Omni-Scale Feature Learning for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[36]  Lu Su,et al.  SenseGAN , 2018 .

[37]  Marc-Alexandre Côté,et al.  Revisiting the Hierarchical Multiscale LSTM , 2018, COLING.

[38]  Panayiotis G. Georgiou,et al.  Redundancy analysis of behavioral coding for couples therapy and improved estimation of behavior from noisy annotations , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[39]  John K Haas,et al.  A History of the Unity Game Engine , 2014 .

[40]  Wai Lok Woo,et al.  IoT Structured Long-Term Wearable Social Sensing for Mental Wellbeing , 2019, IEEE Internet of Things Journal.

[41]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[42]  B. Schoenfeld,et al.  Effect of Repetition Duration During Resistance Training on Muscle Hypertrophy: A Systematic Review and Meta-Analysis , 2015, Sports Medicine.

[43]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Michael W. Beets,et al.  How many steps/day are enough? for children and adolescents , 2011, The international journal of behavioral nutrition and physical activity.

[45]  Seiichi Uchida,et al.  Biosignal Data Augmentation Based on Generative Adversarial Networks , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[46]  Daniel Roggen,et al.  Deep convolutional feature transfer across mobile activity recognition domains, sensor modalities and locations , 2016, SEMWEB.

[47]  Martial Hebert,et al.  Shuffle and Learn: Unsupervised Learning Using Temporal Order Verification , 2016, ECCV.

[48]  Edward D Lemaire,et al.  Classification of Aggressive Movements Using Smartwatches , 2020, Sensors.

[49]  Kristof Van Laerhoven,et al.  Digging deeper: towards a better understanding of transfer learning for human activity recognition , 2020, SEMWEB.

[50]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[51]  Pavlos Protopapas,et al.  T-CGAN: Conditional Generative Adversarial Network for Data Augmentation in Noisy Time Series with Irregular Sampling , 2018, ArXiv.

[52]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[53]  Gwenn Englebienne,et al.  Learning to Recognize Human Activities Using Soft Labels , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  B. Celler,et al.  Accelerometry Based Classification of Walking Patterns Using Time-frequency Analysis , 2007, 2007 29th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[55]  Gregory D. Abowd,et al.  Adding structural characteristics to distribution-based accelerometer representations for activity recognition using wearables , 2018, UbiComp.

[56]  Qiang Yang,et al.  Cross-domain activity recognition via transfer learning , 2011, Pervasive Mob. Comput..

[57]  Thomas Plötz,et al.  Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables , 2016, IJCAI.

[58]  Nir Shavit,et al.  Deep Learning is Robust to Massive Label Noise , 2017, ArXiv.

[59]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[60]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Scott E. Crouter,et al.  Step Counting: A Review of Measurement Considerations and Health-Related Applications , 2016, Sports Medicine.

[62]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[63]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[64]  R. Iman,et al.  Rank Transformations as a Bridge between Parametric and Nonparametric Statistics , 1981 .

[65]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[66]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[67]  William Robson Schwartz,et al.  Human activity recognition based on smartphone and wearable sensors using multiscale DCNN ensemble , 2020, Neurocomputing.

[68]  Wei Sun,et al.  EarEcho , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[69]  Tao Li,et al.  A Deep Learning Method for Complex Human Activity Recognition Using Virtual Wearable Sensors , 2020, SpatialDI.

[70]  Juha Röning,et al.  MyoGym: introducing an open gym data set for activity recognition collected using myo armband , 2017, UbiComp/ISWC Adjunct.

[71]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[72]  Germain Forestier,et al.  Data augmentation using synthetic data for time series classification with deep residual networks , 2018, ArXiv.

[73]  Changshui Zhang,et al.  Multi-Scale Recurrent Neural Network for Sound Event Detection , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[74]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[75]  Damith Chinthana Ranasinghe,et al.  Deep Auto-Set: A Deep Auto-Encoder-Set Network for Activity Recognition Using Wearables , 2018, MobiQuitous.

[76]  Dana Kulic,et al.  Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks , 2017, ICMI.

[77]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[78]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[79]  M. Tahar Kechadi,et al.  Human Activity Recognition with Convolutional Neural Networks , 2018, ECML/PKDD.

[80]  GowdaMahanth,et al.  Finger Gesture Tracking for Interactive Applications: A Pilot Study with Sign Languages , 2020 .

[81]  J. Spence,et al.  How many steps/day are enough? for adults , 2011, The international journal of behavioral nutrition and physical activity.

[82]  Gregory D. Abowd,et al.  Approaching the Real-World , 2021, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[83]  David V. Anderson,et al.  On the role of features in human activity recognition , 2019, UbiComp.

[84]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[85]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[86]  Sang Min Yoon,et al.  Human activity recognition from accelerometer data using Convolutional Neural Network , 2017, 2017 IEEE International Conference on Big Data and Smart Computing (BigComp).

[87]  Paul Lukowicz,et al.  Let there be IMU data: generating training data for wearable, motion sensor based activity recognition from monocular RGB videos , 2019, UbiComp/ISWC Adjunct.

[88]  Flora D. Salim,et al.  Federated Self-Supervised Learning of Multisensor Representations for Embedded Intelligence , 2020, IEEE Internet of Things Journal.

[89]  Lina Yao,et al.  Adversarial Multi-view Networks for Activity Recognition , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[90]  Sozo Inoue,et al.  A Multi-Sensor Setting Activity Recognition Simulation Tool , 2018, UbiComp/ISWC Adjunct.

[91]  Abhinav Vishnu,et al.  Deep learning for computational chemistry , 2017, J. Comput. Chem..

[92]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[93]  Patrick Olivier,et al.  Feature Learning for Activity Recognition in Ubiquitous Computing , 2011, IJCAI.

[94]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[95]  Yunhao Liu,et al.  Deep Learning for Sensor-based Human Activity Recognition , 2020, ACM Comput. Surv..

[96]  Romain Tavenard,et al.  Data Augmentation for Time Series Classification using Convolutional Neural Networks , 2016 .

[97]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[98]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[99]  J. Santos-Concejero,et al.  Total Number of Sets as a Training Volume Quantification Method for Muscle Hypertrophy: A Systematic Review. , 2018, Journal of strength and conditioning research.

[100]  Martin Gjoreski,et al.  Cross-dataset deep transfer learning for activity recognition , 2019, UbiComp/ISWC Adjunct.

[101]  Irfan Essa,et al.  Masked reconstruction based self-supervision for human activity recognition , 2020, SEMWEB.

[102]  D. K. Arvind,et al.  IMUSim: A simulation environment for inertial sensing algorithm design and evaluation , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[103]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[104]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[105]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[106]  Alex Kendall,et al.  What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? , 2017, NIPS.