A Multi-Task Learning Approach for Human Action Detection and Ergonomics Risk Assessment

We propose a new approach to Human Action Evaluation (HAE) in long videos using graph-based multi-task modeling. Previous works in activity assessment either directly compute a metric using a detected skeleton or use the scene information to regress the activity score. These approaches are insufficient for accurate activity assessment since they only compute an average score over a clip, and do not consider the correlation between the joints and body dynamics. Moreover, they are highly scene-dependent which makes the generalizability of these methods questionable. We propose a novel multi-task framework for HAE that utilizes a Graph Convolutional Network backbone to embed the interconnection between human joints in the features. In this framework, we solve the Human Action Detection (HAD) problem as an auxiliary task to improve activity assessment. The HAD head is powered by an Encoder-Decoder Temporal Convolutional Network to detect activities in long videos and HAE uses a Long-Short-Term-Memory-based architecture. We evaluate our method on the UW-IOM and TUM Kitchen datasets and discuss the success and failure cases on these two datasets.

[1]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Khoo Boon How,et al.  Application of Computer Vision and Vector Space Model for Tactical Movement Classification in Badminton , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Antoni B. Chan,et al.  Martial Arts, Dancing and Sports dataset: A challenging stereo and multi-view dataset for 3D human pose estimation , 2017, Image Vis. Comput..

[4]  Nuno Sousa,et al.  Ergonomic Assessment and Workstation Design in a Furniture Manufacturing Industry—A Case Study , 2019 .

[5]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[6]  Govind Sharan Dangayach,et al.  Ergonomic assessment and prevalence of musculoskeletal disorders among washer-men during carpet washing: guidelines to an effective sustainability in workstation design , 2017 .

[7]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Björn E. Ottersten,et al.  Video-based Feedback for Assisting Physical Activity , 2017, VISIGRAPP.

[9]  Serena Ivaldi,et al.  Activity Recognition for Ergonomics Assessment of Industrial Tasks With Automatic Feature Selection , 2019, IEEE Robotics and Automation Letters.

[10]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[11]  Sridha Sridharan,et al.  Fine-grained Action Segmentation using the Semi-Supervised Action GAN , 2019, Pattern Recognit..

[12]  Brendan Tran Morris,et al.  Learning to Score Olympic Events , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Brendan Tran Morris,et al.  What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Yoichi Sato,et al.  Manipulation-Skill Assessment from Videos with Spatial Attention Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[17]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18]  Jin Young Choi,et al.  Skeleton-Based Action Recognition of People Handling Objects , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[19]  Fernando P. Rossato,et al.  Comparison of Methods for Postural Assessment in the Operation of Agricultural Machinery , 2018, Journal of Agricultural Science.

[20]  Gregory D. Hager,et al.  S3D: Stacking Segmental P3D for Action Quality Assessment , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[21]  Ralph Bruder,et al.  The European Assembly Worksheet , 2013 .

[22]  Hassen Drira,et al.  Distances evolution analysis for online and off-line human object interaction recognition , 2018, Image Vis. Comput..

[23]  Akram Sadat Jafari Roodbandi,et al.  Prevalence of Musculoskeletal Disorders and Posture Assessment by QEC and Inter-rater Agreement in This Method in an Automobile Assembly Factory: Iran-2016 , 2018 .

[24]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Brendan Tran Morris,et al.  Measuring the quality of exercises , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[26]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Dima Damen,et al.  Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Behzad Dariush,et al.  Spatio-Temporal Pyramid Graph Convolutions for Human Action Recognition and Postural Assessment , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  Wei-Shi Zheng,et al.  Action Assessment by Joint Relation Graphs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Carlo Alberto Avizzano,et al.  A novel wearable system for the online assessment of risk for biomechanical load in repetitive efforts , 2016 .

[34]  Xu Chen,et al.  Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Antonio Torralba,et al.  Assessing the Quality of Actions , 2014, ECCV.

[36]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Serena Ivaldi,et al.  Human movement and ergonomics: An industry-oriented dataset for collaborative robotics , 2019, Int. J. Robotics Res..

[38]  Petros Daras,et al.  Motion analysis: Action detection, recognition and evaluation based on motion capture data , 2018, Pattern Recognit..

[39]  Martin A. Giese,et al.  Estimation of Skill Levels in Sports Based on Hierarchical Spatio-Temporal Correspondences , 2003, DAGM-Symposium.

[40]  Germain Forestier,et al.  Evaluating surgical skills from kinematic data using convolutional neural networks , 2018, MICCAI.

[41]  SangHyun Lee,et al.  Computer Vision Techniques for Worker Motion Analysis to Reduce Musculoskeletal Disorders in Construction , 2011 .

[42]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[43]  Stefan Wermter,et al.  Human motion assessment in real time using recurrent self-organization , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[44]  William S Marras,et al.  Loading along the lumbar spine as influence by speed, control, load magnitude, and handle height during pushing. , 2009, Clinical biomechanics.

[45]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[46]  Min Xian,et al.  A Deep Learning Framework for Assessing Physical Rehabilitation Exercises , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[47]  Qing Lei,et al.  A Survey of Vision-Based Human Action Evaluation Methods , 2019, Sensors.

[48]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[52]  Kaspar Althoefer,et al.  Real-time Robot-assisted Ergonomics* , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[53]  Santosh Devasia,et al.  Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional Networks , 2019, IEEE Robotics and Automation Letters.

[54]  Sue Hignett,et al.  Rapid Entire Body Assessment , 2004 .