论文信息 - Action Recognition from Arbitrary Views using 3D Exemplars

Action Recognition from Arbitrary Views using 3D Exemplars

In this paper, we address the problem of learning compact, view-independent, realistic 3D models of human actions recorded with multiple cameras, for the purpose of recognizing those same actions from a single or few cameras, without prior knowledge about the relative orientations between the cameras and the subjects. To this aim, we propose a new framework where we model actions using three dimensional occupancy grids, built from multiple viewpoints, in an exemplar-based HMM. The novelty is, that a 3D reconstruction is not required during the recognition phase, instead learned 3D exemplars are used to produce 2D image information that is compared to the observations. Parameters that describe image projections are added as latent variables in the recognition process. In addition, the temporal Markov dependency applied to view parameters allows them to evolve during recognition as with a smoothly moving camera. The effectiveness of the framework is demonstrated with experiments on real datasets and with challenging recognition scenarios.

[1] Jerome R. Bellegarda,et al. Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[2] Y. Aloimonos,et al. View invariant identification of pose sequences for action recognition , 2004 .

[3] Ramakant Nevatia,et al. Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Mubarak Shah,et al. Recognizing human actions in videos acquired by uncalibrated moving cameras , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[5] James W. Davis,et al. Real-time recognition of activity using temporal templates , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[6] Rama Chellappa,et al. View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[7] Brendan J. Frey,et al. Learning Graphical Models of Images, Videos and Their Spatial Transformations , 2000, UAI.

[8] Alex Pentland,et al. Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[9] Aaron F. Bobick,et al. Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[10] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[11] Larry S. Davis,et al. Learning dynamics for exemplar-based gesture recognition , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[12] M. Alex O. Vasilescu,et al. Recognizing action events from multiple viewpoints , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[13] Rémi Ronfard,et al. Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[14] Matthew Brand,et al. Shadow puppetry , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[15] Ronen Basri,et al. Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[16] Mubarak Shah,et al. View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[17] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[18] Dariu Gavrila,et al. Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[19] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[20] Andrew Blake,et al. Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.