A Multi-Task Learning Approach for Human Activity Segmentation and Ergonomics Risk Assessment

We propose a new approach to Human Activity Evaluation (HAE) in long videos using graph-based multi-task modeling. Previous works in activity evaluation either directly compute a metric using a detected skeleton or use the scene information to regress the activity score. These approaches are insufficient for accurate activity assessment since they only compute an average score over a clip, and do not consider the correlation between the joints and body dynamics. Moreover, they are highly scene-dependent which makes the generalizability of these methods questionable. We propose a novel multi-task framework for HAE that utilizes a Graph Convolutional Network backbone to embed the interconnections between human joints in the features. In this framework, we solve the Human Activity Segmentation (HAS) problem as an auxiliary task to improve activity assessment. The HAS head is powered by an Encoder-Decoder Temporal Convolutional Network to semantically segment long videos into distinct activity classes, whereas, HAE uses a Long-Short-Term-Memory-based architecture. We evaluate our method on the UW-IOM and TUM Kitchen datasets and discuss the success and failure cases in these two datasets.

[1]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Nuno Sousa,et al.  Ergonomic Assessment and Workstation Design in a Furniture Manufacturing Industry—A Case Study , 2019 .

[3]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Björn E. Ottersten,et al.  Video-based Feedback for Assisting Physical Activity , 2017, VISIGRAPP.

[7]  Govind Sharan Dangayach,et al.  Ergonomic assessment and prevalence of musculoskeletal disorders among washer-men during carpet washing: guidelines to an effective sustainability in workstation design , 2017 .

[8]  Yoichi Sato,et al.  Manipulation-Skill Assessment from Videos with Spatial Attention Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[9]  Serena Ivaldi,et al.  Activity Recognition for Ergonomics Assessment of Industrial Tasks With Automatic Feature Selection , 2019, IEEE Robotics and Automation Letters.

[10]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[11]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jin Young Choi,et al.  Skeleton-Based Action Recognition of People Handling Objects , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Fernando P. Rossato,et al.  Comparison of Methods for Postural Assessment in the Operation of Agricultural Machinery , 2018, Journal of Agricultural Science.

[15]  Gregory D. Hager,et al.  S3D: Stacking Segmental P3D for Action Quality Assessment , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[16]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[18]  Hassen Drira,et al.  Distances evolution analysis for online and off-line human object interaction recognition , 2018, Image Vis. Comput..

[19]  Min Xian,et al.  A Deep Learning Framework for Assessing Physical Rehabilitation Exercises , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[20]  Brendan Tran Morris,et al.  Measuring the quality of exercises , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[21]  Sue Hignett,et al.  Rapid Entire Body Assessment , 2004 .

[22]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[23]  Qing Lei,et al.  A Survey of Vision-Based Human Action Evaluation Methods , 2019, Sensors.

[24]  Petros Daras,et al.  Motion analysis: Action detection, recognition and evaluation based on motion capture data , 2018, Pattern Recognit..

[25]  Carlo Alberto Avizzano,et al.  A novel wearable system for the online assessment of risk for biomechanical load in repetitive efforts , 2016 .

[26]  Antonio Torralba,et al.  Assessing the Quality of Actions , 2014, ECCV.

[27]  Ralph Bruder,et al.  The European Assembly Worksheet , 2013 .

[28]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[29]  Serena Ivaldi,et al.  Human movement and ergonomics: An industry-oriented dataset for collaborative robotics , 2019, Int. J. Robotics Res..

[30]  Martin A. Giese,et al.  Estimation of Skill Levels in Sports Based on Hierarchical Spatio-Temporal Correspondences , 2003, DAGM-Symposium.

[31]  Sridha Sridharan,et al.  Fine-grained Action Segmentation using the Semi-Supervised Action GAN , 2019, Pattern Recognit..

[32]  Akram Sadat Jafari Roodbandi,et al.  Prevalence of Musculoskeletal Disorders and Posture Assessment by QEC and Inter-rater Agreement in This Method in an Automobile Assembly Factory: Iran-2016 , 2018 .

[33]  Khoo Boon How,et al.  Application of Computer Vision and Vector Space Model for Tactical Movement Classification in Badminton , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Antoni B. Chan,et al.  Martial Arts, Dancing and Sports dataset: A challenging stereo and multi-view dataset for 3D human pose estimation , 2017, Image Vis. Comput..

[35]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[36]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[38]  William S Marras,et al.  Loading along the lumbar spine as influence by speed, control, load magnitude, and handle height during pushing. , 2009, Clinical biomechanics.

[39]  Behzad Dariush,et al.  Spatio-Temporal Pyramid Graph Convolutions for Human Action Recognition and Postural Assessment , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[40]  Xu Chen,et al.  Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[42]  Kaspar Althoefer,et al.  Real-time Robot-assisted Ergonomics* , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[43]  Santosh Devasia,et al.  Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional Networks , 2019, IEEE Robotics and Automation Letters.

[44]  Brendan Tran Morris,et al.  Learning to Score Olympic Events , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Brendan Tran Morris,et al.  What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  SangHyun Lee,et al.  Computer Vision Techniques for Worker Motion Analysis to Reduce Musculoskeletal Disorders in Construction , 2011 .

[47]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[48]  Stefan Wermter,et al.  Human motion assessment in real time using recurrent self-organization , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[49]  Dima Damen,et al.  Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Wei-Shi Zheng,et al.  Action Assessment by Joint Relation Graphs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Germain Forestier,et al.  Evaluating surgical skills from kinematic data using convolutional neural networks , 2018, MICCAI.

[52]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[53]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).