Approaching the Real-World

Recently, IMUTube introduced a paradigm change for bootstrapping human activity recognition (HAR) systems for wearables. The key idea is to utilize videos of activities to support training activity recognizers based on inertial measurement units (IMUs). This system retrieves video from public repositories and subsequently generates virtual IMU data from this. The ultimate vision for such a system is to make large amounts of weakly labeled videos accessible for model training in HAR and, as such, to overcome one of the most pressing issues in the field: the lack of significant amounts of labeled sample data. In this paper we present the first in-detail exploration of IMUTube in a realistic assessment scenario: the analysis of free-weight gym exercises. We make significant progress towards a flexible, fully-functional IMUTube system by extending it such that it can handle a range of artifacts that are common in unrestricted online videos, including various forms of video noise, non-human poses, body part occlusions, and extreme camera and human motion. By overcoming these real-world challenges, we are able to generate high-quality virtual IMU data, which allows us to employ IMUTube for practical analysis tasks. We show that HAR systems trained by incorporating virtual sensor data generated by IMUTube significantly outperform baseline models trained only with real IMU data. In doing so we demonstrate the practical utility of IMUTube and the progress made towards the final vision of the new bootstrapping paradigm.

[1]  T. Schreiber,et al.  Discrimination power of measures for nonlinearity in a time series , 1997, chao-dyn/9909043.

[2]  Ross B. Girshick,et al.  Mask R-CNN , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Felix Heide,et al.  Pixel-Accurate Depth Evaluation in Realistic Driving Scenarios , 2019, 2019 International Conference on 3D Vision (3DV).

[4]  Mohamed A. Elgharib,et al.  XNect , 2020 .

[5]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Aaron Quigley,et al.  End-User Development of Experience Sampling Smartphone Apps -Recommendations and Requirements , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[7]  Dacheng Tao,et al.  Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing , 2019, AAAI.

[8]  Feiyue Huang,et al.  Learning by Analogy: Reliable Supervision From Transformations for Unsupervised Optical Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Cholmin Kang,et al.  Towards Machine Learning with Zero Real-World Data , 2019, WearSys@MobiSys.

[10]  Ricardo Chavarriaga,et al.  The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition , 2013, Pattern Recognit. Lett..

[11]  Yi Yang,et al.  Occlusion Aware Unsupervised Learning of Optical Flow , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  F. Rudzicz,et al.  WearBreathing: Real World Respiratory Rate Monitoring Using Smartwatches , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[13]  T. Abdelzaher,et al.  SenseGAN: Enabling Deep Learning for Internet of Things with a Semi-Supervised Framework , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[14]  Quoc V. Le,et al.  Unsupervised Data Augmentation for Consistency Training , 2019, NeurIPS.

[15]  Joel A. Hesch,et al.  A Direct Least-Squares (DLS) method for PnP , 2011, 2011 International Conference on Computer Vision.

[16]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Yu-Bin Yang,et al.  Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections , 2016, NIPS.

[18]  Paul Lukowicz,et al.  Yet it moves: Learning from Generic Motions to Generate IMU data from YouTube videos , 2020, ArXiv.

[19]  Pascal Fua,et al.  XNect , 2019, ACM Trans. Graph..

[20]  Jeffrey M. Hausdorff,et al.  Wearable Assistant for Parkinson’s Disease Patients With the Freezing of Gait Symptom , 2010, IEEE Transactions on Information Technology in Biomedicine.

[21]  Thomas Brox,et al.  A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Sangxia Huang,et al.  MM-Fit , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[23]  Sunny Consolvo,et al.  Using the Experience Sampling Method to Evaluate Ubicomp Applications , 2003, IEEE Pervasive Comput..

[24]  Michael R. Lyu,et al.  DDFlow: Learning Optical Flow with Unlabeled Data Distillation , 2019, AAAI.

[25]  Jitendra Malik,et al.  Deformable part models are convolutional neural networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Stefan Roth,et al.  Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[28]  Vladlen Koltun,et al.  Colored Point Cloud Registration Revisited , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Gregory D. Abowd,et al.  Handling annotation uncertainty in human activity recognition , 2019, UbiComp.

[32]  Gregory D. Abowd,et al.  IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[33]  Ardhendu Behera,et al.  Unsupervised Monocular Depth Estimation for Night-time Images using Adversarial Domain Feature Adaptation , 2020, ECCV.

[34]  Wei-Shi Zheng,et al.  Unsupervised Learning for Optical Flow Estimation Using Pyramid Convolution LSTM , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[35]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Philipp Scholl,et al.  Wearables in the wet lab: a laboratory system for capturing and guiding experiments , 2015, UbiComp.

[37]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[38]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[39]  Paul L. Rosin,et al.  Pose2Seg: Detection Free Human Instance Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Konstantinos G. Derpanis,et al.  Back to Basics: Unsupervised Learning of Optical Flow via Brightness Constancy and Motion Smoothness , 2016, ECCV Workshops.

[41]  Peter Andras,et al.  On preserving statistical characteristics of accelerometry data using their empirical cumulative distribution , 2013, ISWC '13.

[42]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[43]  Jinhui Tang,et al.  Cascaded Deep Video Deblurring Using Temporal Sharpness Prior , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Serge J. Belongie,et al.  Pose2Instance: Harnessing Keypoints for Person Instance Segmentation , 2017, ArXiv.

[45]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Stefan Roth,et al.  UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss , 2017, AAAI.

[48]  Andrea Cavallaro,et al.  Omni-Scale Feature Learning for Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Zhichao Yin,et al.  GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Dana Kulic,et al.  Exercise motion classification from large-scale wearable sensor data using convolutional neural networks , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[51]  Luc Van Gool,et al.  Tracking People in Broadcast Sports , 2010, DAGM-Symposium.

[52]  John K Haas,et al.  A History of the Unity Game Engine , 2014 .

[53]  Wolfram Burgard,et al.  A benchmark for the evaluation of RGB-D SLAM systems , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[54]  Ming Liu,et al.  Reliable Monocular Ego-Motion Estimation System in Rainy Urban Environments , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[55]  Jitendra Malik,et al.  Human Pose Estimation with Iterative Error Feedback , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Seiichi Uchida,et al.  Biosignal Data Augmentation Based on Generative Adversarial Networks , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[58]  ActivityGAN , 2020, Adjunct Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers.

[59]  Meng Wang,et al.  Graphonomy: Universal Human Parsing via Graph Transfer Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Shuicheng Yan,et al.  Mutual Learning to Adapt for Joint Human Parsing and Pose Estimation , 2018, ECCV.

[61]  Stella X. Yu,et al.  Large-Scale Long-Tailed Recognition in an Open World , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Dan Morris,et al.  RecoFit: using a wearable sensor to find, recognize, and count repetitive exercises , 2014, CHI.

[63]  Jean Charles Bazin,et al.  DeepCalib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras , 2018, CVMP '18.

[64]  Mi Zhang,et al.  USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors , 2012, UbiComp.

[65]  Gregory D. Abowd,et al.  Students' Experiences with Ecological Momentary Assessment Tools to Report on Emotional Well-being , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[66]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Rabih Younes,et al.  ActivityGAN: generative adversarial networks for data augmentation in sensor-based human activity recognition , 2020, UbiComp/ISWC Adjunct.

[68]  Ken Shoemake,et al.  Animating rotation with quaternion curves , 1985, SIGGRAPH.

[69]  Pietro Liò,et al.  Using Deep Data Augmentation Training to Address Software and Hardware Heterogeneities in Wearable and Smartphone Sensing Devices , 2018, 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[70]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[71]  Hongdong Li,et al.  Adversarial Spatio-Temporal Learning for Video Deblurring , 2018, IEEE Transactions on Image Processing.

[72]  Miguel A. Labrador,et al.  A Survey on Human Activity Recognition using Wearable Sensors , 2013, IEEE Communications Surveys & Tutorials.

[73]  Wenxian Yu,et al.  MARS: Mixed Virtual and Real Wearable Sensors for Human Activity Recognition With Multidomain Deep Learning Model , 2020, IEEE Internet of Things Journal.

[74]  Andreas W. Kempa-Liehr,et al.  Distributed and parallel time series feature extraction for industrial big data applications , 2016, ArXiv.

[75]  Yong-Jin Liu,et al.  Towards Better Generalization: Joint Depth-Pose Learning Without PoseNet , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Bernhard Schölkopf,et al.  Online Video Deblurring via Dynamic Temporal Blending Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[77]  Ming Tang,et al.  Progressive Cognitive Human Parsing , 2018, AAAI.

[78]  Michael R. Lyu,et al.  SelFlow: Self-Supervised Learning of Optical Flow , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[80]  Michael J. Black,et al.  Supplementary Material for Unsupervised Learning of Multi-Frame Optical Flow with Occlusions , 2018 .

[81]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[82]  R. Iman,et al.  Rank Transformations as a Bridge between Parametric and Nonparametric Statistics , 1981 .

[83]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[84]  Tao Li,et al.  A Deep Learning Method for Complex Human Activity Recognition Using Virtual Wearable Sensors , 2020, SpatialDI.

[85]  Juha Röning,et al.  MyoGym: introducing an open gym data set for activity recognition collected using myo armband , 2017, UbiComp/ISWC Adjunct.

[86]  Germain Forestier,et al.  Data augmentation using synthetic data for time series classification with deep residual networks , 2018, ArXiv.

[87]  E. B. Wilson Probable Inference, the Law of Succession, and Statistical Inference , 1927 .

[88]  Jorge Gonçalves,et al.  Gamification of Mobile Experience Sampling Improves Data Quality and Quantity , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[89]  Pekka Siirtola,et al.  Recognizing gym exercises using acceleration data from wearable sensors , 2014, 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[90]  Dana Kulic,et al.  Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks , 2017, ICMI.

[91]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[92]  Bernt Schiele,et al.  A tutorial on human activity recognition using body-worn inertial sensors , 2014, CSUR.

[93]  Yi Li,et al.  Instance-Sensitive Fully Convolutional Networks , 2016, ECCV.

[94]  Ismail Ben Ayed,et al.  Adversarial Learning of General Transformations for Data Augmentation , 2019, ArXiv.

[95]  Diogo R. Ferreira,et al.  Preprocessing techniques for context recognition from accelerometer data , 2010, Personal and Ubiquitous Computing.

[96]  Jia-Bin Huang,et al.  DF-Net: Unsupervised Joint Learning of Depth and Flow using Cross-Task Consistency , 2018, ECCV.

[97]  Peter V. Gehler,et al.  DeepCut: Joint Subset Partition and Labeling for Multi Person Pose Estimation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[99]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[100]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[101]  Paul Lukowicz,et al.  Let there be IMU data: generating training data for wearable, motion sensor based activity recognition from monocular RGB videos , 2019, UbiComp/ISWC Adjunct.

[102]  Sozo Inoue,et al.  A Multi-Sensor Setting Activity Recognition Simulation Tool , 2018, UbiComp/ISWC Adjunct.

[103]  Mingyan Liu,et al.  Spatially Transformed Adversarial Examples , 2018, ICLR.

[104]  Eamonn J. Keogh,et al.  CID: an efficient complexity-invariant distance for time series , 2013, Data Mining and Knowledge Discovery.

[105]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[106]  Paul Lukowicz,et al.  Generative Oversampling Method for Imbalanced Data on Bearing Fault Detection and Diagnosis , 2019, Applied Sciences.

[107]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[108]  Eyal de Lara,et al.  WearBreathing: Real World Respiratory Rate Monitoring Using Smartwatches , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[109]  Romain Tavenard,et al.  Data Augmentation for Time Series Classification using Convolutional Neural Networks , 2016 .

[110]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[111]  Pietro Perona,et al.  The Devil is in the Tails: Fine-grained Classification in the Wild , 2017, ArXiv.

[112]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[113]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[114]  D. K. Arvind,et al.  IMUSim: A simulation environment for inertial sensing algorithm design and evaluation , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[115]  Sanja Fidler,et al.  SGN: Sequential Grouping Networks for Instance Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[116]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[117]  Timo Sztyler,et al.  On-body localization of wearable devices: An investigation of position-aware activity recognition , 2016, 2016 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[118]  Ming Yang,et al.  Instance-level Human Parsing via Part Grouping Network , 2018, ECCV.