Assessing the Quality of Actions

While recent advances in computer vision have provided reliable methods to recognize actions in both images and videos, the problem of assessing how well people perform actions has been largely unexplored in computer vision. Since methods for assessing action quality have many real-world applications in healthcare, sports, and video retrieval, we believe the computer vision community should begin to tackle this challenging problem. To spur progress, we introduce a learning-based framework that takes steps towards assessing how well people perform actions in videos. Our approach works by training a regression model from spatiotemporal pose features to scores obtained from expert judges. Moreover, our approach can provide interpretable feedback on how people can improve their action. We evaluate our method on a new Olympic sports dataset, and our experiments suggest our framework is able to rank the athletes more accurately than a non-expert human. While promising, our method is still a long way to rivaling the performance of expert judges, indicating that there is significant opportunity in computer vision research to improve on this difficult yet important task.

[1]  Matej Kristan,et al.  Automatic Evaluation of Organized Basketball Activity using Bayesian Networks , 2007 .

[2]  Deva Ramanan,et al.  Detecting activities of daily living in first-person camera views , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Danny Crookes,et al.  Hierarchical video summarization in reference subspace , 2009, IEEE Transactions on Consumer Electronics.

[5]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[6]  Antti Oulasvirta,et al.  Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[7]  Stanislav Kovacic,et al.  Trajectory Based Assessment of Coordinated Human Activity , 2003, ICVS.

[8]  Fei-Fei Li,et al.  Action Recognition with Exemplar Based 2.5D Graph Matching , 2012, ECCV.

[9]  A. Murat Tekalp,et al.  Automatic soccer video analysis and summarization , 2003, IEEE Trans. Image Process..

[10]  Eli Shechtman,et al.  Space-Time Behavior-Based Correlation-OR-How to Tell If Two Underlying Motion Fields Are Similar Without Computing Them? , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Patrick Pérez,et al.  Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Chih-Jen Lin,et al.  Large-Scale Video Summarization Using Web-Image Priors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Xin Liu,et al.  Video summarization using singular value decomposition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  NgoChong-Wah,et al.  Video summarization and scene detection by graph modeling , 2005 .

[18]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[19]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[20]  Subhransu Maji,et al.  Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[21]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[22]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Allen R. Hanson,et al.  Computer Vision Systems , 1978 .

[24]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[25]  Yan Ke,et al.  The Design of High-Level Features for Photo Quality Assessment , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Andrew S. Gordon,et al.  Automated Video Assessment of Human Performance , 1997 .

[27]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[28]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[29]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[31]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[32]  Yael Pritch,et al.  Making a Long Video Short: Dynamic Video Synopsis , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Ivan Laptev,et al.  Learning person-object interactions for action recognition in still images , 2011, NIPS.

[34]  Luc Van Gool,et al.  The Interestingness of Images , 2013, 2013 IEEE International Conference on Computer Vision.

[35]  Quoc V. Le,et al.  Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[36]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[37]  Thomas Deselaers,et al.  ClassCut for Unsupervised Class Segmentation , 2010, ECCV.

[38]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Katsushi Ikeuchi,et al.  Detecting potential falling objects by inferring human action and natural disturbance , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[40]  Eli Shechtman,et al.  Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[41]  Matthieu Guillaumin,et al.  Segmentation Propagation in ImageNet , 2012, ECCV.

[42]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[43]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.