A Multi-Task Learning Approach for Human Activity Segmentation and Ergonomics Risk Assessment

We propose a new approach to Human Activity Evaluation (HAE) in long videos using graph-based multi-task modeling. Previous works in activity evaluation either directly compute a metric using a detected skeleton or use the scene information to regress the activity score. These approaches are insufficient for accurate activity assessment since they only compute an average score over a clip, and do not consider the correlation between the joints and body dynamics. Moreover, they are highly scene-dependent which makes the generalizability of these methods questionable. We propose a novel multi-task framework for HAE that utilizes a Graph Convolutional Network backbone to embed the interconnections between human joints in the features. In this framework, we solve the Human Activity Segmentation (HAS) problem as an auxiliary task to improve activity assessment. The HAS head is powered by an Encoder-Decoder Temporal Convolutional Network to semantically segment long videos into distinct activity classes, whereas, HAE uses a Long-Short-Term-Memory-based architecture. We evaluate our method on the UW-IOM and TUM Kitchen datasets and discuss the success and failure cases in these two datasets.

[1]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[2]  Khoo Boon How,et al.  Application of Computer Vision and Vector Space Model for Tactical Movement Classification in Badminton , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Min Xian,et al.  A Deep Learning Framework for Assessing Physical Rehabilitation Exercises , 2019, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[4]  Cordelia Schmid,et al.  LCR-Net: Localization-Classification-Regression for Human Pose , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Brendan Tran Morris,et al.  Learning to Score Olympic Events , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[6]  Nuno Sousa,et al.  Ergonomic Assessment and Workstation Design in a Furniture Manufacturing Industry—A Case Study , 2019 .

[7]  Cordelia Schmid,et al.  P-CNN: Pose-Based CNN Features for Action Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Antoni B. Chan,et al.  Martial Arts, Dancing and Sports dataset: A challenging stereo and multi-view dataset for 3D human pose estimation , 2017, Image Vis. Comput..

[9]  Dario Pavllo,et al.  3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Jin Young Choi,et al.  Skeleton-Based Action Recognition of People Handling Objects , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Hassen Drira,et al.  Distances evolution analysis for online and off-line human object interaction recognition , 2018, Image Vis. Comput..

[14]  Behzad Dariush,et al.  Spatio-Temporal Pyramid Graph Convolutions for Human Action Recognition and Postural Assessment , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Kaspar Althoefer,et al.  Real-time Robot-assisted Ergonomics* , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[16]  Brendan Tran Morris,et al.  What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[18]  William S Marras,et al.  Loading along the lumbar spine as influence by speed, control, load magnitude, and handle height during pushing. , 2009, Clinical biomechanics.

[19]  Martin A. Giese,et al.  Estimation of Skill Levels in Sports Based on Hierarchical Spatio-Temporal Correspondences , 2003, DAGM-Symposium.

[20]  Stefan Wermter,et al.  Human motion assessment in real time using recurrent self-organization , 2016, 2016 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN).

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  SangHyun Lee,et al.  Computer Vision Techniques for Worker Motion Analysis to Reduce Musculoskeletal Disorders in Construction , 2011 .

[23]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[24]  Gregory D. Hager,et al.  S3D: Stacking Segmental P3D for Action Quality Assessment , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[25]  Yoichi Sato,et al.  Manipulation-Skill Assessment from Videos with Spatial Attention Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[26]  Sue Hignett,et al.  Rapid Entire Body Assessment , 2004 .

[27]  Brendan Tran Morris,et al.  Measuring the quality of exercises , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[28]  Serena Ivaldi,et al.  Human movement and ergonomics: An industry-oriented dataset for collaborative robotics , 2019, Int. J. Robotics Res..

[29]  Dima Damen,et al.  Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Fernando P. Rossato,et al.  Comparison of Methods for Postural Assessment in the Operation of Agricultural Machinery , 2018, Journal of Agricultural Science.

[31]  Ralph Bruder,et al.  The European Assembly Worksheet , 2013 .

[32]  Akram Sadat Jafari Roodbandi,et al.  Prevalence of Musculoskeletal Disorders and Posture Assessment by QEC and Inter-rater Agreement in This Method in an Automobile Assembly Factory: Iran-2016 , 2018 .

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Björn E. Ottersten,et al.  Video-based Feedback for Assisting Physical Activity , 2017, VISIGRAPP.

[35]  Wei-Shi Zheng,et al.  Action Assessment by Joint Relation Graphs , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Santosh Devasia,et al.  Toward Ergonomic Risk Prediction via Segmentation of Indoor Object Manipulation Actions Using Spatiotemporal Convolutional Networks , 2019, IEEE Robotics and Automation Letters.

[37]  Antonio Torralba,et al.  Assessing the Quality of Actions , 2014, ECCV.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Carlo Alberto Avizzano,et al.  A novel wearable system for the online assessment of risk for biomechanical load in repetitive efforts , 2016 .

[40]  Sridha Sridharan,et al.  Fine-grained Action Segmentation using the Semi-Supervised Action GAN , 2019, Pattern Recognit..

[41]  Govind Sharan Dangayach,et al.  Ergonomic assessment and prevalence of musculoskeletal disorders among washer-men during carpet washing: guidelines to an effective sustainability in workstation design , 2017 .

[42]  Qing Lei,et al.  A Survey of Vision-Based Human Action Evaluation Methods , 2019, Sensors.

[43]  Gregory D. Hager,et al.  Temporal Convolutional Networks for Action Segmentation and Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xu Chen,et al.  Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[46]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[47]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[48]  Serena Ivaldi,et al.  Activity Recognition for Ergonomics Assessment of Industrial Tasks With Automatic Feature Selection , 2019, IEEE Robotics and Automation Letters.

[49]  Petros Daras,et al.  Motion analysis: Action detection, recognition and evaluation based on motion capture data , 2018, Pattern Recognit..

[50]  Andreas Stolcke,et al.  The Microsoft 2017 Conversational Speech Recognition System , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[51]  Germain Forestier,et al.  Evaluating surgical skills from kinematic data using convolutional neural networks , 2018, MICCAI.

[52]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Tieniu Tan,et al.  An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).