End-To-End Learning for Action Quality Assessment

Nowadays, action quality assessment has attracted more and more attention of the researchers in computer vision. In this paper, an end-to-end framework is proposed based on fragment-based 3D convolutional neural network to realize the action quality assessment in videos. Furthermore, the ranking loss integrated with the MSE forms the loss function to make the optimization more reasonable in terms of both the score value and the ranking aspects. Through the deep learning, we narrow the gap between the predictions and ground-truth scores as well as making the predictions satisfy the ranking constraint. The proposed network can indeed learn the evaluation criteria of actions and works well with limited training data. Widely experiments conducted on three public datasets convincingly show that our method achieves the state-of-the-art results.

[1]  Brian C. Lovell,et al.  Towards Miss Universe automatic prediction: The evening gown competition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[2]  Tao Mei,et al.  Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Irfan A. Essa,et al.  Automated Assessment of Surgical Skills Using Frequency Analysis , 2015, MICCAI.

[5]  Majid Mirmehdi,et al.  A comparative study of pose representation and dynamics modelling for online motion quality assessment , 2016, Comput. Vis. Image Underst..

[6]  Antonio Torralba,et al.  Assessing the Quality of Actions , 2014, ECCV.

[7]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[8]  Xilin Chen,et al.  SignInstructor: An Effective Tool for Sign Language Vocabulary Learning , 2017, 2017 4th IAPR Asian Conference on Pattern Recognition (ACPR).

[9]  Brendan Tran Morris,et al.  Measuring the quality of exercises , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[10]  Brendan Tran Morris,et al.  Learning to Score Olympic Events , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[11]  Pavan K. Turaga,et al.  Dynamical Regularity for Action Analysis , 2015, BMVC.

[12]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).