Substructure and boundary modeling for continuous action recognition

This paper introduces a probabilistic graphical model for continuous action recognition with two novel components: substructure transition model and discriminative boundary model. The first component encodes the sparse and global temporal transition prior between action primitives in state-space model to handle the large spatial-temporal variations within an action class. The second component enforces the action duration constraint in a discriminative way to locate the transition boundaries between actions more accurately. The two components are integrated into a unified graphical structure to enable effective training and inference. Our comprehensive experimental results on both public and in-house datasets show that, with the capability to incorporate additional information that had not been explicitly or efficiently modeled by previous methods, our proposed algorithm achieved significantly improved performance for continuous action recognition.

[1]  Manuele Bicego,et al.  Sparseness Achievement in Hidden Markov Models , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[2]  Toyoaki Nishida,et al.  Incremental gesture discovery for interactive robots , 2010, 2010 IEEE International Conference on Robotics and Biomimetics.

[3]  Michael J. Dueker,et al.  Non-Markovian Regime Switching with Endogenous States and Time-Varying State Strengths , 2004 .

[4]  Stefano Soatto,et al.  Spike train driven dynamical models for human actions , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Xiaokang Yang,et al.  Event recognition with time varying Hidden Markov Model , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Martin Karplus,et al.  Bayesian estimates of free energies from nonequilibrium work data in the presence of instrument noise. , 2007, The Journal of chemical physics.

[7]  Stan Lipovetsky,et al.  Double logistic curve in regression modeling , 2010 .

[8]  Stephen E. Levinson,et al.  Continuously variable duration hidden Markov models for automatic speech recognition , 1986 .

[9]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[10]  Fernando De la Torre,et al.  Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[11]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[12]  MetaxasDimitris,et al.  Conditional models for contextual human motion recognition , 2006 .

[13]  Neil J. Gordon,et al.  A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking , 2002, IEEE Trans. Signal Process..

[14]  Louis-Philippe Morency,et al.  Modeling Latent Discriminative Dynamic of Multi-dimensional Affective Signals , 2011, ACII.

[15]  Andrew McCallum,et al.  Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.

[16]  Ruiduo Yang,et al.  Coupled grouping and matching for sign and gesture recognition , 2009, Comput. Vis. Image Underst..

[17]  David Barber,et al.  A generative model for music transcription , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Frank Dellaert,et al.  A Rao-Blackwellized particle filter for EigenTracking , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[20]  Michael I. Jordan,et al.  Factorial Hidden Markov Models , 1995, Machine Learning.

[21]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[22]  R. Bakis Continuous speech recognition via centisecond acoustic states , 1976 .

[23]  Michael I. Jordan,et al.  Nonparametric Bayesian Learning of Switching Linear Dynamical Systems , 2008, NIPS.

[24]  Keiichi Tokuda,et al.  Duration modeling for HMM-based speech synthesis , 1998, ICSLP.

[25]  Danica Kragic,et al.  Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects , 2008, ECCV.

[26]  James M. Rehg,et al.  Parameterized Duration Mmodeling for Switching Linear Dynamic Systems , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Jessica K. Hodgins,et al.  Aligned Cluster Analysis for temporal segmentation of human motion , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[28]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  James M. Rehg,et al.  Learning and Inferring Motion Patterns using Parametric Segmental Switching Linear Dynamic Systems , 2008, International Journal of Computer Vision.

[30]  Shunzheng Yu,et al.  Hidden semi-Markov models , 2010, Artif. Intell..

[31]  M.Cristani Mcristani,et al.  Sparseness Achievement with Hidden Markov Models for 2D shape analysis , 2007 .

[32]  Uri Lerner,et al.  Inference in Hybrid Networks: Theoretical Limits and Practical Algorithms , 2001, UAI.

[33]  Mario Sznaier,et al.  Sequential sparsification for change detection , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Martial Hebert,et al.  Modeling the Temporal Extent of Actions , 2010, ECCV.

[35]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  Nando de Freitas,et al.  Rao-Blackwellised Particle Filtering for Dynamic Bayesian Networks , 2000, UAI.

[37]  Yelong Shen,et al.  Sparse hidden-dynamics conditional random fields for user intent understanding , 2011, WWW.

[38]  David Barber,et al.  Expectation Correction for Smoothed Inference in Switching Linear Dynamical Systems , 2006, J. Mach. Learn. Res..

[39]  Bart Selman,et al.  Human Activity Detection from RGBD Images , 2011, Plan, Activity, and Intent Recognition.

[40]  Marina Kolesnik,et al.  Switching Hidden Markov Models for Learning of Motion Patterns in Videos , 2009, ICANN.

[41]  Mark J. F. Gales,et al.  Rao-Blackwellised Gibbs sampling for switching linear dynamical systems , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[42]  James M. Rehg,et al.  Learning and inference in parametric switching linear dynamic systems , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[43]  Rémi Ronfard,et al.  Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[44]  Yihong Gong,et al.  Latent Pose Estimator for Continuous Action Recognition , 2008, ECCV.

[45]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[46]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.