Action video retrieval based on atomic action vocabulary

We propose an efficient action retrieval system that is based on a novel action representation and an effective video matching method. We represent actions with a hierarchical encoding scheme that at low-level measures local body parts motions, which then evolves into encoding of instantaneous global body motions and finally high-level description of actions through atomic action vocabulary. Atomic action vocabulary extends the notion of keyframe-based indexing techniques, where a long action video is decomposed into a sequence of atomic sub-actions matched from the vocabulary. Efficient video matching is achieved by exploiting precomputed inter-vocabulary distances so that global video distance between video sequences can be computed in a very efficient manner that is equivalent to index lookup operations with minimal additional computational loads. Combined with atomic action vocabulary, this can provide flexible video matching schemes of finding compound action sequences of arbitrary lengths. The proposed approach is evaluated on surveillance video and a public video dataset.

[1]  Lihi Zelnik-Manor,et al.  Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Feng Han,et al.  Learning Exemplar-Based Categorization for the Detection of Multi-View Multi-Pose Objects , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[4]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[6]  Martial Hebert,et al.  Event Detection in Crowded Videos , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, ICCV.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Sebastian Nowozin,et al.  Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[10]  Larry S. Davis,et al.  Action recognition using ballistic dynamics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Thomas Serre,et al.  A Biologically Inspired System for Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13]  R. Venkatesh Babu,et al.  Content-based video retrieval using motion descriptors extracted from compressed domain , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[14]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, CVPR 2004.

[15]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  Martial Hebert,et al.  Efficient visual event detection using volumetric features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.