Comparison of cuboid and tracklet features for action recognition on surveillance videos

For recognition of human actions in surveillance videos, action recognition methods in literature are analyzed and coherent feature extraction methods that are promising for success in such videos are identified. Based on local methods, most popular two feature extraction methods (Dollar's “cuboid” feature definition and Raptis and Soatto's “tracklet” feature definition) are tested and compared. Both methods were classified by different methods in their original applications. In order to obtain a more fair comparison both methods are classified by using the same classification method. In addition, as it is more realistic for recognition of real videos, two most popular datasets KTH and Weizmann are classified by splitting method. According to the test results, convenience of tracklet features over other methods for action recognition in real surveillance videos is proven to be successful.

[1]  Adriana Kovashka,et al.  Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[3]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[4]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[5]  Serge J. Belongie,et al.  Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[6]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[7]  Yannis Avrithis,et al.  Dense saliency-based spatiotemporal feature points for action recognition , 2009, CVPR.

[8]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[9]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[10]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.