Action Points: A Representation for Low-latency Online Human Action Recognition

Applications of human action recognition in interactive systems such as games require the robust real-time recognition of human actions at low latencies from a stream of observations. The current paradigms of action recognition either treat the pre-segmented sequence as a whole unit to be classified, or classify a range of frames as action, evaluating the performance using a frame-by-frame measure. We argue that both paradigms are limited when addressing latency requirements. Instead, we propose the notion of “action points” to serve as natural temporal anchors of simple human actions. Action points enable latency-aware training and evaluation of online recognition systems. To demonstrate the usefulness of action points we show how two different systems, a Hidden Markov Model and a direct classification approach can be used with action point annotations. We evaluate our approach on two data sets with different input modalities and show that our abstraction of action points is useful in settings where human action recognition has to be performed online and at low latencies.

[1]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[2]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[3]  Paul A. Viola,et al.  Online decoding of Markov models under latency constraints , 2006, ICML.

[4]  Jean-Yves Guillemaut,et al.  3D action matching with key-pose detection , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[5]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Meinard Müller,et al.  Efficient content-based retrieval of motion capture data , 2005, SIGGRAPH '05.

[7]  Ramakant Nevatia,et al.  Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Rémi Ronfard,et al.  Action Recognition from Arbitrary Views using 3D Exemplars , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Rama Chellappa,et al.  Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Luc Van Gool,et al.  Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[13]  J. Shotton,et al.  Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[14]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  R. Nevatia,et al.  Online, Real-time Tracking and Recognition of Human Actions , 2008, 2008 IEEE Workshop on Motion and video Computing.

[16]  Eric Moulines,et al.  Inference in hidden Markov models , 2010, Springer series in statistics.

[17]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[18]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[19]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[20]  Luc Van Gool,et al.  Does Human Action Recognition Benefit from Pose Estimation? , 2011, BMVC.

[21]  William H. Offenhauser,et al.  Wild Boars as Hosts of Human-Pathogenic Anaplasma phagocytophilum Variants , 2012, Emerging infectious diseases.

[22]  Mark J. F. Gales,et al.  The Application of Hidden Markov Models in Speech Recognition , 2007, Found. Trends Signal Process..

[23]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[24]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[25]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[26]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[27]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[28]  Haikady N. Nagaraja,et al.  Inference in Hidden Markov Models , 2006, Technometrics.

[29]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[30]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[31]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Gerhard Rigoll,et al.  Continuous online gesture recognition based on hidden Markov models , 1998 .