Action recognition based on spatio-temporal interest points

Action recognition has already become one of the important research directions in the computer vision domain. Action recognition techniques have made remarkable progress in recent years. This interest is motivated by a wide spectrum of applications in many fields, such as smart surveillance, virtual reality, human computer interaction, and motion analysis. In this paper, we divide action recognition based on spatio-temporal interest points into three fundamental processes: spatio-temporal interest point detection, feature classification and action representation and recognition. We focus on the three areas to take the further study on action recognition based on spatiotemporal interest points. Also we make a comparative studies of the approaches used systematically. We propose the existing problems and possible future development trends of action recognition.

[1]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[2]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[3]  Luc Van Gool,et al.  An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[4]  David A. Forsyth,et al.  Searching Video for Complex Activities with Finite State Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Larry S. Davis,et al.  W4: Real-Time Surveillance of People and Their Activities , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Anupam Agrawal,et al.  Representing Feature Quantization Approach Using Spatial-Temporal Relation for Action Recognition , 2012, PerMIn.

[7]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[8]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[9]  Wen Gao,et al.  Action Recognition in Broadcast Tennis Video , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Ivan Laptev,et al.  Local Descriptors for Spatio-temporal Recognition , 2004, SCVMA.

[12]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[13]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[14]  Trevor Darrell,et al.  Hidden Conditional Random Fields for Gesture Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Jintao Li,et al.  Hierarchical spatio-temporal context modeling for action recognition , 2009, CVPR.

[16]  Peter Kovesi,et al.  Phase Congruency Detects Corners and Edges , 2003, DICTA.

[17]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[18]  I. Patras,et al.  Spatiotemporal salient points for visual recognition of human actions , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Takeo Kanade,et al.  A System for Video Surveillance and Monitoring , 2000 .

[20]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[21]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[22]  Wang Liang,et al.  A Survey of Visual Analysis of Human Motion , 2002 .

[23]  Shaogang Gong,et al.  Fusing appearance and distribution information of interest points for action recognition , 2012, Pattern Recognit..

[24]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[25]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Juan Carlos Niebles,et al.  A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Shaogang Gong,et al.  Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[28]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[29]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[31]  Mohiuddin Ahmad,et al.  Human action recognition using multi-view image sequences , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[32]  Li Jun Human Interaction Recognition Using Spatio-Temporal Words , 2010 .

[33]  Silvio Savarese,et al.  Discriminative Object Class Models of Appearance and Shape by Correlatons , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Hironobu Fujiyoshi,et al.  A System for Video Surveillance and Monitoring CMU VSAM Final Report , 1999 .

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).