Action Recognition for Videos by Long-Term Point Trajectory Analysis with Background Removal

Recently, dense trajectories were shown to be an efficient video motion representation for action recognition and achieved state-of-the-art results on a variety of video datasets. This paper improves their performance by taking into account camera motion. To estimate camera motion, the authors use long-term point trajectory analysis to cluster image points and propose an algorithm to find possible background cluster from these clusters according to background nature in a video. Considering the original clusters could not segment the foreground and background very well. The authors optimize the background cluster, and use the cluster to rectify the trajectory. Experimental results on three challenging action datasets (i.e., Hollywood2, Olympic Sports and UCF50) show that the rectified trajectories significantly outperform original dense trajectories.

[1]  Gian Luca Foresti,et al.  Trajectory-Based Anomalous Event Detection , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[3]  Yu Xiang,et al.  Accurate background points detection for action recognition in practical video datasets , 2016 .

[4]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[5]  Bernhard Rinner,et al.  Video Analysis in Pan-Tilt-Zoom Camera Networks , 2010, IEEE Signal Processing Magazine.

[6]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[7]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Cordelia Schmid,et al.  Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Krystian Mikolajczyk,et al.  Feature Tracking and Motion Compensation for Action Recognition , 2008, BMVC.

[10]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[12]  Iasonas Kokkinos,et al.  Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Chong-Wah Ngo,et al.  Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling , 2015, IEEE Transactions on Image Processing.

[14]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[15]  Gian Luca Foresti,et al.  Exploiting Temporal Statistics for Events Analysis and Understanding , 2007, ICIAP.

[16]  Patrick Bouthemy,et al.  Better Exploiting Motion for Better Action Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Stefano Soatto,et al.  Tracklet Descriptors for Action Modeling and Video Analysis , 2010, ECCV.

[18]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[19]  Vibhav Vineet,et al.  Efficient Salient Region Detection with Soft Image Abstraction , 2013, 2013 IEEE International Conference on Computer Vision.

[20]  Patrick Bouthemy,et al.  A Statistical Video Content Recognition Method Using Invariant Features on Object Trajectories , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  C. Schmid,et al.  Recognizing activities with cluster-trees of tracklets , 2012, BMVC.

[22]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Dan Schonfeld,et al.  Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models , 2007, IEEE Transactions on Image Processing.

[24]  Yu Xiang,et al.  Robust Approach for Interesting Points Extraction of Moving Human from 2D Videos , 2015, ISIP.

[25]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[26]  Thomas Brox,et al.  Motion Trajectory Segmentation via Minimum Cost Multicuts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Cordelia Schmid,et al.  Dense Trajectories and Motion Boundary Descriptors for Action Recognition , 2013, International Journal of Computer Vision.

[28]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[29]  Chong-Wah Ngo,et al.  Trajectory-Based Modeling of Human Actions with Motion Reference Points , 2012, ECCV.

[30]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[31]  Patrick Bouthemy,et al.  Recognition of Dynamic Video Contents With Global Probabilistic Models of Visual Motion , 2006, IEEE Transactions on Image Processing.

[32]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  W. Eric L. Grimson,et al.  Trajectory Analysis and Semantic Region Modeling Using Nonparametric Hierarchical Bayesian Models , 2011, International Journal of Computer Vision.

[34]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.