Action Disambiguation Analysis Using Normalized Google-Like Distance Correlogram

Classifying realistic human actions in video remains challenging for existing intro-variability and inter-ambiguity in action classes. Recently, Spatial-Temporal Interest Point (STIP) based local features have shown great promise in complex action analysis. However, these methods have the limitation that they typically focus on Bag-of-Words (BoW) algorithm, which can hardly discriminate actions' ambiguity due to ignoring of spatial-temporal occurrence relations of visual words. In this paper, we propose a new model to capture this contextual relationship in terms of pairwise features' co-occurrence. Normalized Google-Like Distance (NGLD) is proposed to numerically measuring this co-occurrence, due to its effectiveness in semantic correlation analysis. All pairwise distances compose a NGLD correlogram and its normalized form is incorporated into the final action representation. It is proved a much richer descriptor by observably reducing action ambiguity in experiments, conducted on WEIZMANN dataset and the more challenging UCF sports. Results also demonstrate the proposed model is more effective and robust than BoW on different setups.

[1]  Shaogang Gong,et al.  Recognising action as clouds of space-time interest points , 2009, CVPR.

[2]  Gang Hua,et al.  Integrated feature selection and higher-order spatial feature extraction for object categorization , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Stefan Carlsson,et al.  Automatic learning and extraction of multi-local features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Juan Carlos Niebles,et al.  Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[5]  Shimon Edelman,et al.  Representation and recognition in vision , 1999 .

[6]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Ramakant Nevatia,et al.  Learning Neighborhood Co-occurrence Statistics of Sparse Features for Human Activity Recognition , 2011 .

[10]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories , 2006 .

[11]  Ramakant Nevatia,et al.  Learning neighborhood cooccurrence statistics of sparse features for human activity recognition , 2011, 2011 8th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[12]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[15]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.

[16]  Francis Ferraro,et al.  Recognizing manipulation actions in arts and crafts shows using domain-specific visual and textual cues , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[17]  Mubarak Shah,et al.  Actions sketch: a novel action representation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[18]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[19]  Lior Wolf,et al.  Local Trinary Patterns for human action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[20]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[21]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[22]  M.,et al.  Statistical and Structural Approaches to Texture , 2022 .

[23]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[24]  Jitendra Malik,et al.  Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.