Human Action Recognition Using Distribution of Oriented Rectangular Patches

We describe a "bag-of-rectangles" method for representing and recognizing human actions in videos. In this method, each human pose in an action sequence is represented by oriented rectangular patches extracted over the whole body. Then, spatial oriented histograms are formed to represent the distribution of these rectangular patches. In order to carry the information from the spatial domain described by the bag-of-rectangles descriptor to temporal domain for recognition of the actions, four different methods are proposed. These are namely, (i) frame by frame voting, which recognizes the actions by matching the descriptors of each frame, (ii) global histogramming, which extends the idea of Motion Energy Image proposed by Bobick and Davis by rectangular patches, (iii) a classifier based approach using SVMs, and (iv) adaptation of Dynamic Time Warping on the temporal representation of the descriptor. The detailed experiments are carried out on the action dataset of Blank et. al. High success rates (100%) prove that with a very simple and compact representation, we can achieve robust recognition of human actions, compared to complex representations.

[1]  Randal C. Nelson,et al.  Detecting activities , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[2]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[3]  David A. Forsyth,et al.  Body plans , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Claudio S. Pinhanez,et al.  Human action detection using PNF propagation of temporal constraints , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[6]  Aaron F. Bobick,et al.  Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Thomas S. Huang,et al.  Gesture modeling and recognition using finite state machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[8]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jeffrey Mark Siskind,et al.  Reconstructing force-dynamic models from video sequences , 2003, Artif. Intell..

[10]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Ramakant Nevatia,et al.  Video-based event recognition: activity representation and probabilistic recognition methods , 2004, Comput. Vis. Image Underst..

[12]  Tieniu Tan,et al.  A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[14]  Eric Horvitz,et al.  Layered representations for learning and inferring office activity from multiple sensory channels , 2004, Comput. Vis. Image Underst..

[15]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[16]  David A. Forsyth,et al.  Strike a pose: tracking people by finding stylized poses , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[17]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Alexei A. Efros,et al.  Discovering object categories in image collections , 2005 .

[21]  David A. Forsyth,et al.  Computational Studies of Human Motion: Part 1, Tracking and Motion Synthesis , 2005, Found. Trends Comput. Graph. Vis..

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Haibin Ling,et al.  Diffusion Distance for Histogram Comparison , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Daniel Gatica-Perez,et al.  Modeling Semantic Aspects for Cross-Media Image Indexing , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[26]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.