Evaluation of Color STIPs for Human Action Recognition

This paper is concerned with recognizing realistic human actions in videos based on spatio-temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity representations of the image data. Because of this, these approaches are sensitive to disturbing photometric phenomena such as highlights and shadows. Moreover, valuable information is neglected by discarding chromaticity from the photometric representation. These issues are addressed by Color STIPs. Color STIPs are multi-channel reformulations of existing intensity-based STIP detectors and descriptors, for which we consider a number of chromatic representations derived from the opponent color space. This enhanced modeling of appearance improves the quality of subsequent STIP detection and description. Color STIPs are shown to substantially outperform their intensity-based counterparts on the challenging UCF~sports, UCF11 and UCF50 action recognition benchmarks. Moreover, the results show that color STIPs are currently the single best low-level feature choice for STIP-based approaches to human action recognition.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[3]  Alexander G. Hauptmann,et al.  MoSIFT : Recognizing Human Actions in Surveillance Videos CMU-CS-09-161 , 2009 .

[4]  Gang Hua,et al.  Discriminative Learning of Local Image Descriptors , 1990, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Hui Cheng,et al.  Evaluation of low-level features and their combinations for complex event detection in open source videos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Theo Gevers,et al.  Per-patch Descriptor Selection Using Surface and Scene Properties , 2012, ECCV.

[7]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[9]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[10]  Joost van de Weijer,et al.  Robust photometric invariant features from the color tensor , 2006, IEEE Transactions on Image Processing.

[11]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Nicu Sebe,et al.  Sparse Color Interest Points for Image Retrieval and Object Categorization , 2012, IEEE Transactions on Image Processing.

[13]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Fillipe Dias Moreira de Souza,et al.  An Evaluation on Color Invariant Based Local Spatiotemporal Features for Action Recognition , 2012 .

[15]  Allan Hanbury,et al.  FeEval A Dataset for Evaluation of Spatio-temporal Local Features , 2010, 2010 20th International Conference on Pattern Recognition.

[16]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  David Elliott,et al.  In the Wild , 2010 .

[19]  Steven A. Shafer,et al.  Using color to separate reflection components , 1985 .

[20]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[21]  Mubarak Shah,et al.  Classifying web videos using a global video descriptor , 2013, Machine Vision and Applications.

[22]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[23]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[24]  Koen E. A. van de Sande,et al.  Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Joost van de Weijer,et al.  Edge and corner detection by photometric quasi-invariants , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.