IMUTube: Automatic extraction of virtual on-body accelerometry from video for human activity recognition

The lack of large-scale, labeled data sets impedes progress in developing robust and generalized predictive models for on-body sensor-based human activity recognition (HAR). Labeled data in human activity recognition is scarce and hard to come by, as sensor data collection is expensive, and the annotation is time-consuming and error-prone. To address this problem, we introduce IMUTube, an automated processing pipeline that integrates existing computer vision and signal processing techniques to convert videos of human activity into virtual streams of IMU data. These virtual IMU streams represent accelerometry at a wide variety of locations on the human body. We show how the virtually-generated IMU data improves the performance of a variety of models on known HAR datasets. Our initial results are very promising, but the greater promise of this work lies in a collective approach by the computer vision, signal processing, and activity recognition communities to extend this work in ways that we outline. This should lead to on-body, sensor-based HAR becoming yet another success story in large-dataset breakthroughs in recognition.

[1]  Gregory D. Abowd,et al.  Automatic Synchronization of Wearable Sensors and Video-Cameras for Ground Truth Annotation -- A Practical Approach , 2012, 2012 16th International Symposium on Wearable Computers.

[2]  Cholmin Kang,et al.  Towards Machine Learning with Zero Real-World Data , 2019, WearSys@MobiSys.

[3]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[4]  Ricardo Chavarriaga,et al.  The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition , 2013, Pattern Recognit. Lett..

[5]  Bernard Ghanem,et al.  ActivityNet: A large-scale video benchmark for human activity understanding , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  HerreraFrancisco,et al.  SMOTE for learning from imbalanced data , 2018 .

[7]  F. Rudzicz,et al.  WearBreathing: Real World Respiratory Rate Monitoring Using Smartwatches , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[8]  T. Abdelzaher,et al.  SenseGAN: Enabling Deep Learning for Internet of Things with a Semi-Supervised Framework , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[9]  Junehwa Song,et al.  A Systematic Study of Unsupervised Domain Adaptation for Robust Human-Activity Recognition , 2020, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[10]  Ye Xu,et al.  Enabling large-scale human activity inference on smartphones using community similarity networks (csn) , 2011, UbiComp '11.

[11]  Joel A. Hesch,et al.  A Direct Least-Squares (DLS) method for PnP , 2011, 2011 International Conference on Computer Vision.

[12]  Roland Siegwart,et al.  A Review of Point Cloud Registration Algorithms for Mobile Robotics , 2015, Found. Trends Robotics.

[13]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Gang Wang,et al.  NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  B. Caprile,et al.  Using vanishing points for camera calibration , 1990, International Journal of Computer Vision.

[17]  Jeffrey M. Hausdorff,et al.  Wearable Assistant for Parkinson’s Disease Patients With the Freezing of Gait Symptom , 2010, IEEE Transactions on Information Technology in Biomedicine.

[18]  Johan Lukkien,et al.  Multi-task Self-Supervised Learning for Human Activity Detection , 2019, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[19]  Ling Shao,et al.  Human-Aware Motion Deblurring , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Ali Farhadi,et al.  Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.

[21]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[22]  Dong Seog Han,et al.  Feature Representation and Data Augmentation for Human Activity Classification Based on Wearable IMU Sensor Data Using a Deep LSTM Neural Network , 2018, Sensors.

[23]  Vladlen Koltun,et al.  Colored Point Cloud Registration Revisited , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[27]  Philipp Scholl,et al.  Wearables in the wet lab: a laboratory system for capturing and guiding experiments , 2015, UbiComp.

[28]  Michael J. Black,et al.  SMPL: A Skinned Multi-Person Linear Model , 2023 .

[29]  Gregory D. Abowd,et al.  A practical approach for recognizing eating moments with wrist-mounted inertial sensing , 2015, UbiComp.

[30]  Michael J. Black,et al.  ClothCap: seamless 4D clothing capture and retargeting , 2017, ACM Trans. Graph..

[31]  Michael J. Black,et al.  Deep Inertial Poser: Learning to Reconstruct Human Pose from Sparse Inertial Measurements in Real Time , 2018 .

[32]  Peter Andras,et al.  On preserving statistical characteristics of accelerometry data using their empirical cumulative distribution , 2013, ISWC '13.

[33]  Taghi M. Khoshgoftaar,et al.  A survey on Image Data Augmentation for Deep Learning , 2019, Journal of Big Data.

[34]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Thomas Plötz,et al.  Ensembles of Deep LSTM Learners for Activity Recognition using Wearables , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Khandakar M. Rashid,et al.  Times-series data augmentation and deep learning for construction equipment activity recognition , 2019, Adv. Eng. Informatics.

[40]  Yiqiang Chen,et al.  Cross-People Mobile-Phone Based Activity Recognition , 2011, IJCAI.

[41]  Thorsten Brants,et al.  One billion word benchmark for measuring progress in statistical language modeling , 2013, INTERSPEECH.

[42]  Gierad Laput,et al.  Sensing Fine-Grained Hand Activity with Smartwatches , 2019, CHI.

[43]  Nikolaus F. Troje,et al.  AMASS: Archive of Motion Capture As Surface Shapes , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Seiichi Uchida,et al.  Biosignal Data Augmentation Based on Generative Adversarial Networks , 2018, 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[45]  Siwei Feng,et al.  Few-Shot Learning-Based Human Activity Recognition , 2019, Expert Syst. Appl..

[46]  Jun-Young Lee,et al.  The Development of an IMU Integrated Clothes for Postural Monitoring Using Conductive Yarn and Interconnecting Technology , 2017, Sensors.

[47]  Charles Malleson,et al.  Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors , 2017, BMVC.

[48]  Jean Charles Bazin,et al.  DeepCalib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras , 2018, CVMP '18.

[49]  Mi Zhang,et al.  USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors , 2012, UbiComp.

[50]  Li Xu,et al.  Discriminative Blur Detection Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[52]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[53]  Paul J. Besl,et al.  A Method for Registration of 3-D Shapes , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  Pietro Liò,et al.  Using Deep Data Augmentation Training to Address Software and Hardware Heterogeneities in Wearable and Smartphone Sensing Devices , 2018, 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN).

[55]  Andrew Zisserman,et al.  A Short Note on the Kinetics-700 Human Action Dataset , 2019, ArXiv.

[56]  Pavlos Protopapas,et al.  T-CGAN: Conditional Generative Adversarial Network for Data Augmentation in Noisy Time Series with Irregular Sampling , 2018, ArXiv.

[57]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[58]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[59]  Jiyang Yu,et al.  Robust Video Stabilization by Optimization in CNN Weight Space , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Xianyue Wu,et al.  BodySim: a multi-domain modeling and simulation framework for body sensor networks research and design , 2013, SenSys '13.

[61]  Robert Pless,et al.  Extrinsic calibration of a camera and laser range finder (improves camera calibration) , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[62]  Nicholas D. Lane,et al.  Are Accelerometers for Activity Recognition a Dead-end? , 2020, HotMobile.

[63]  Thomas Plötz,et al.  Deep, Convolutional, and Recurrent Models for Human Activity Recognition Using Wearables , 2016, IJCAI.

[64]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[65]  Alhussein Albarbar,et al.  Review on Smart Electro-Clothing Systems (SeCSs) , 2019, Sensors.

[66]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[67]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[68]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[69]  Fabio Tozeto Ramos,et al.  Simple online and realtime tracking , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[70]  R. Iman,et al.  Rank Transformations as a Bridge between Parametric and Nonparametric Statistics , 1981 .

[71]  Lina J. Karam,et al.  Spatially-Varying Blur Detection Based on Multiscale Fused and Sorted Transform Coefficients of Gradient Magnitudes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Tao Li,et al.  A Deep Learning Method for Complex Human Activity Recognition Using Virtual Wearable Sensors , 2020, SpatialDI.

[73]  Germain Forestier,et al.  Data augmentation using synthetic data for time series classification with deep residual networks , 2018, ArXiv.

[74]  Dana Kulic,et al.  Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks , 2017, ICMI.

[75]  Anelia Angelova,et al.  Depth From Videos in the Wild: Unsupervised Monocular Depth Learning From Unknown Cameras , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76]  David V. Anderson,et al.  On the role of features in human activity recognition , 2019, UbiComp.

[77]  Daniel Roggen,et al.  Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition , 2016, Sensors.

[78]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[79]  Paul Lukowicz,et al.  Let there be IMU data: generating training data for wearable, motion sensor based activity recognition from monocular RGB videos , 2019, UbiComp/ISWC Adjunct.

[80]  Michael J. Black,et al.  Dyna: a model of dynamic human shape in motion , 2015, ACM Trans. Graph..

[81]  Didier Stricker,et al.  Personalized mobile physical activity recognition , 2013, ISWC '13.

[82]  Sozo Inoue,et al.  A Multi-Sensor Setting Activity Recognition Simulation Tool , 2018, UbiComp/ISWC Adjunct.

[83]  Emre Ertin,et al.  cStress: towards a gold standard for continuous stress assessment in the mobile environment , 2015, UbiComp.

[84]  Vincent Dumoulin,et al.  Deconvolution and Checkerboard Artifacts , 2016 .

[85]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[86]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[87]  Young Soo Suh,et al.  Spline Function Simulation Data Generation for Walking Motion Using Foot-Mounted Inertial Sensors , 2018 .

[88]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[89]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[90]  Jake K. Aggarwal,et al.  Intrinsic parameter calibration procedure for a (high-distortion) fish-eye lens camera with distortion model and accuracy estimation , 1996, Pattern Recognit..

[91]  Romain Tavenard,et al.  Data Augmentation for Time Series Classification using Convolutional Neural Networks , 2016 .

[92]  Marcus A. Magnor,et al.  Learning to Reconstruct People in Clothing From a Single RGB Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[93]  Jitendra Malik,et al.  End-to-End Recovery of Human Shape and Pose , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[94]  D. K. Arvind,et al.  IMUSim: A simulation environment for inertial sensing algorithm design and evaluation , 2011, Proceedings of the 10th ACM/IEEE International Conference on Information Processing in Sensor Networks.

[95]  Didier Stricker,et al.  Introducing a New Benchmarked Dataset for Activity Monitoring , 2012, 2012 16th International Symposium on Wearable Computers.

[96]  Timo Sztyler,et al.  On-body localization of wearable devices: An investigation of position-aware activity recognition , 2016, 2016 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[97]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Shakir Mohamed,et al.  Distribution Matching in Variational Inference , 2018, ArXiv.

[99]  Hanqi Zhuang,et al.  A self-calibration approach to extrinsic parameter estimation of stereo cameras , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.