AIFit: Automatic 3D Human-Interpretable Feedback Models for Fitness Training

I went to the gym today, but how well did I do? And where should I improve? Ah, my back hurts slightly... User engagement can be sustained and injuries avoided by being able to reconstruct 3d human pose and motion, relate it to good training practices, identify errors, and provide early, real-time feedback. In this paper we introduce the first automatic system, AIFit, that performs 3d human sensing for fitness training. The system can be used at home, outdoors, or at the gym. AIFit is able to reconstruct 3d human pose, shape, and motion, reliably segment exercise repetitions, and identify in real-time the deviations between standards learnt from trainers, and the execution of a trainee. As a result, localized, quantitative feedback for correct execution of exercises, reduced risk of injury, and continuous improvement is possible. To support research and evaluation, we introduce the first large scale dataset, Fit3D, containing over 3 million images and corresponding 3d human shape and motion capture ground truth configurations, with over 37 repeated exercises, covering all the major muscle groups, performed by instructors and trainees. Our statistical coach is governed by a global parameter that captures how critical it should be of a trainee’s performance. This is an important aspect that helps adapt to a student’s level of fitness (i.e. beginner vs. advanced vs. expert), or to the expected accuracy of a 3d pose reconstruction method. We show that, for different values of the global parameter, our feedback system based on 3d pose estimates achieves good accuracy compared to the one based on ground-truth motion capture. Our statistical coach offers feedback in natural language, and with spatio-temporal visual grounding.

[1]  Xavier Binefa,et al.  Robust Real-Time Periodic Motion Detection, Analysis, and Applications , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Antonis A. Argyros,et al.  Unsupervised Detection of Periodic Segments in Videos , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[3]  David Picard,et al.  2D/3D Pose Estimation and Action Recognition Using Multitask Deep Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  R. Troiano,et al.  Physical Activity Guidelines for Americans From the US Department of Health and Human Services: Cardiovascular Benefits and Recommendations , 2018, Circulation. Cardiovascular quality and outcomes.

[6]  Hao Zhu,et al.  CrowdPose: Efficient Crowded Scenes Pose Estimation and a New Benchmark , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jonathan Tompson,et al.  Counting Out Time: Class Agnostic Video Repetition Counting in the Wild , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Cristian Sminchisescu,et al.  Deep Multitask Architecture for Integrated 2D and 3D Human Sensing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xiaogang Wang,et al.  3D Human Pose Estimation in the Wild by Adversarial Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yuan Yao,et al.  Virtual Personal Trainer via the Kinect Sensor , 2015, 2015 IEEE 16th International Conference on Communication Technology (ICCT).

[11]  Jianke Zhu,et al.  AI Coach: Deep Human Pose Estimation and Analysis for Personalized Athletic Training Assistance , 2019, ACM Multimedia.

[12]  Na Li,et al.  Exercise as a prescription for patients with various diseases , 2019, Journal of sport and health science.

[13]  Catherine Achard,et al.  Deep, Robust and Single Shot 3D Multi-Person Human Pose Estimation from Monocular Images , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[14]  Christian Theobalt,et al.  Single-Shot Multi-person 3D Pose Estimation from Monocular RGB , 2017, 2018 International Conference on 3D Vision (3DV).

[15]  Daniel P. Siewiorek,et al.  AHA-3D: A Labelled Dataset for Senior Fitness Exercise Recognition and Segmentation from 3D Skeletal Data , 2018, BMVC.

[16]  Dimitrios Tzionas,et al.  Expressive Body Capture: 3D Hands, Face, and Body From a Single Image , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Cristian Sminchisescu,et al.  Three-Dimensional Reconstruction of Human Interactions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Michael J. Black,et al.  Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Cristian Sminchisescu,et al.  Learning Complex 3D Human Self-Contact , 2020, AAAI.

[20]  Deborah F. Tate,et al.  High-Tech Tools for Exercise Motivation: Use and Role of Technologies Such as the Internet, Mobile Applications, Social Media, and Video Games , 2015, Diabetes Spectrum.

[21]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[22]  Cristian Sminchisescu,et al.  GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Gang Wang,et al.  NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Cristian Sminchisescu,et al.  Monocular 3D Pose and Shape Estimation of Multiple People in Natural Scenes: The Importance of Multiple Scene Constraints , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Arnold W. M. Smeulders,et al.  Real-World Repetition Estimation by Div, Grad and Curl , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Maxim Kazakov,et al.  DeepMark: One-Shot Clothing Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[27]  Elizabeth Anderson,et al.  Effects of Exercise and Physical Activity on Anxiety , 2013, Front. Psychiatry.

[28]  Ruimao Zhang,et al.  DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  James J. Little,et al.  A Simple Yet Effective Baseline for 3d Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Lior Wolf,et al.  Live Repetition Counting , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Paul Lukowicz,et al.  Never skip leg day: A novel wearable approach to monitoring gym leg exercises , 2016, 2016 IEEE International Conference on Pervasive Computing and Communications (PerCom).

[32]  Fahad Shahbaz Khan,et al.  Mask-Guided Attention Network for Occluded Pedestrian Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  C. Matthews,et al.  Sedentary behavior: emerging evidence for a new health risk. , 2010, Mayo Clinic proceedings.

[34]  Cristian Sminchisescu,et al.  Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images , 2018, NeurIPS.

[35]  A. Bernardino,et al.  AHA-3 D : A Labelled Dataset for Senior Fitness Exercise Recognition and Segmentation from 3 D Skeletal Data , 2018 .

[36]  Dongdong Yu,et al.  Multi-Person Pose Estimation With Enhanced Channel-Wise and Spatial Information , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  John K. Tsotsos,et al.  PIE: A Large-Scale Dataset and Models for Pedestrian Intention Estimation and Trajectory Prediction , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Bastian Wandt,et al.  Human pose estimation from monocular images , 2020 .

[40]  Haoran Xie,et al.  Visual Feedback for Core Training with 3D Human Shape and Pose , 2019, 2019 Nicograph International (NicoInt).

[41]  Gernot Bauer,et al.  Live-feedback from the IMUs: animated 3D visualization for everyday-exercising , 2016, UbiComp Adjunct.