Measuring and reducing observational latency when recognizing actions

An important aspect in interactive, action-based interfaces is the latency in recognizing the action. High latency will cause the system's feedback to lag behind user actions, reducing the overall quality of the user experience. This paper presents a novel dataset and algorithms for reducing the latency in recognizing the action. Latency in classification is minimized with a classifier based on logistic regression that uses canonical poses to identify the action. The classifier is trained from the dataset using a learning formulation that makes it possible to train the classifier to reduce latency. The classifier is compared against both a Bag of Words and a Conditional Random Field classifier and is found to be superior in both pre-segmented and on-line classification tasks.

[1]  David A. McAllester,et al.  Cascade object detection with deformable part models , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[3]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[4]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Joseph J. LaViola,et al.  Exploring the Trade-off Between Accuracy and Observational Latency in Action Recognition , 2013, International Journal of Computer Vision.

[6]  Yang Wang,et al.  Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Mubarak Shah,et al.  Learning human actions via information maximization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Zicheng Liu,et al.  Cross-dataset action detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Hassan Foroosh,et al.  View-Invariant Action Recognition from Point Triplets , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[12]  James W. Davis,et al.  Minimal-latency human action recognition using reliable-inference , 2006, Image Vis. Comput..

[13]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[14]  Stan Sclaroff,et al.  A Unified Framework for Gesture Recognition and Spatiotemporal Gesture Segmentation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Mubarak Shah,et al.  Human Action Recognition in Videos Using Kinematic Features and Multiple Instance Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Michael J. Black,et al.  Estimating human shape and pose from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Paul A. Viola,et al.  Online decoding of Markov models under latency constraints , 2006, ICML.

[19]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[20]  Joseph J. LaViola,et al.  Exploring strategies and guidelines for developing full body video game interfaces , 2010, FDG.

[21]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[22]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.