Bags of Trajectory Words for video indexing

A semantic indexing system capable of detecting both spatial appearance and motion-related semantic concepts requires the use of both spatial and motion descriptors. However, extracting motion descriptors on very large video collections requires great computational resources, which has caused most approaches to limit themselves to a spatial description. This paper explores the use of motion descriptors to complement such spatial descriptions and improve the overall performance of a generic semantic indexing system. We propose a framework for extracting and describing trajectories of tracked points that keeps computational cost manageable, then we construct Bag of Words representations with these trajectories. After supervised classification, a late fusion step combines information from spatial descriptors with that from our proposed Bag of Trajectory Words descriptors to improve overall results. We evaluate our approach in the very difficult context of the TRECVid Semantic Indexing (SIN) dataset.

[1]  Patrick Lambert,et al.  Retina enhanced SIFT descriptors for video indexing , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[2]  William Brendel,et al.  Activities as Time Series of Human Postures , 2010, ECCV.

[3]  Nicolas Ballas,et al.  Trajectories based descriptor for dynamic events annotation , 2011, J-MRE '11.

[4]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Mubarak Shah,et al.  Action recognition in videos acquired by a moving camera using motion decomposition of Lagrangian particle trajectories , 2011, 2011 International Conference on Computer Vision.

[6]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[7]  Alexander G. Hauptmann,et al.  MoSIFT: Recognizing Human Actions in Surveillance Videos , 2009 .

[8]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[9]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[10]  Stan Sclaroff,et al.  Trajectory Guided Tracking and Recognition of Actions , 1999 .

[11]  Martial Hebert,et al.  Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[12]  Emine Yilmaz,et al.  A simple and efficient sampling method for estimating AP and NDCG , 2008, SIGIR '08.

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Stefan M. Rüger,et al.  An Overview of Evaluation Campaigns in Multimedia Retrieval , 2010, ImageCLEF.

[15]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[18]  Barbara Caputo,et al.  Local velocity-adapted motion events for spatio-temporal recognition , 2007, Comput. Vis. Image Underst..

[19]  Hervé Glotin,et al.  IRIM at TRECVID 2014: Semantic Indexing and Instance Search , 2014, TRECVID.

[20]  Cordelia Schmid,et al.  Actom sequence models for efficient action detection , 2011, CVPR 2011.

[21]  Sabin Tiberius Strat Analysis and interpretation of visual scenes through collaborative approaches , 2013 .

[22]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[23]  Ioannis A. Kakadiaris,et al.  Part-based motion descriptor image for human action recognition , 2012, Pattern Recognit..