论文信息 - B-spline polynomial descriptors for human activity recognition

B-spline polynomial descriptors for human activity recognition

The extraction and quantization of local image and video descriptors for the subsequent creation of visual codebooks is a technique that has proved extremely effective for image and video retrieval applications. In this paper we build on this concept and extract a new set of visual descriptors that are derived from spatiotemporal salient points detected on given image sequences and provide local space-time description of the visual activity. The proposed descriptors are based on the geometrical properties of three-dimensional piecewise polynomials, namely B-splines, that are fitted on the spatiotemporal locations of the salient points that are engulfed within a given spatiotemporal neighborhood. Our descriptors are inherently translation invariant, while the use of the scales of the salient points for the definition of the neighborhood dimensions ensures space-time scaling invariance. Subsequently, a clustering algorithm is used in order to cluster our descriptors across the whole dataset and create a codebook of visual verbs, where each verb corresponds to a cluster center. We use the resulting code- book in a 'bag of verbs' approach in order to recover the pose and short-term motion of subjects at a short set of successive frames, and we use dynamic time warping (DTW) in order to align the sequences in our dataset and structure in time the recovered poses. We define a kernel based on the similarity measure provided by the DTW to classify our examples in a relevance vector machine classification scheme. We present results in a well established human activity database to verify the effectiveness of our method.

[1] Thomas Serre,et al. A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2] I. Patras,et al. Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3] M. Brady,et al. Scale Saliency: a novel approach to salient feature and scale selection , 2003 .

[4] Frédéric Jurie,et al. Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[5] Yi-Ping Hung,et al. Appearance-guided particle filtering for articulated hand tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[6] Cordelia Schmid,et al. Spatial Weighting for Bag-of-Features , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7] Sidharth Bhatia,et al. Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[8] Jitendra Malik,et al. Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[9] Ankur Agarwal,et al. Hyperfeatures - Multilevel Local Coding for Visual Recognition , 2006, ECCV.

[10] Eli Shechtman,et al. Matching Local Self-Similarities across Images and Videos , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Ronald Poppe,et al. Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[12] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13] K. Lempert,et al. CONDENSED 1,3,5-TRIAZEPINES - IV THE SYNTHESIS OF 2,3-DIHYDRO-1H-IMIDAZO-[1,2-a] [1,3,5] BENZOTRIAZEPINES , 1983 .

[14] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[15] Juan Carlos Niebles,et al. A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Andrew Zisserman,et al. Video Google: Efficient Visual Search of Videos , 2006, Toward Category-Level Object Recognition.

[17] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[18] Cordelia Schmid,et al. Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[19] Andrew Zisserman,et al. A Boundary-Fragment-Model for Object Detection , 2006, ECCV.

[20] Michael E. Tipping. The Relevance Vector Machine , 1999, NIPS.

[21] Alexei A. Efros,et al. Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22] Ronen Basri,et al. Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.