A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos

This paper presents an algorithm for the temporal segmentation of user-generated videos into visually coherent parts that correspond to individual video capturing activities. The latter include camera pan and tilt, change in focal length and camera displacement. The proposed approach identifies the aforementioned activities by extracting and evaluating the region-level spatio-temporal distribution of the optical flow over sequences of neighbouring video frames. The performance of the algorithm was evaluated with the help of a newly constructed ground-truth dataset, against several state-of-the-art techniques and variations of them. Extensive evaluation indicates the competitiveness of the proposed approach in terms of detection accuracy, and highlight its suitability for analysing large collections of data in a time-efficient manner.

[1]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Moncef Gabbouj,et al.  Multimodal Event Detection in User Generated Videos , 2011, 2011 IEEE International Symposium on Multimedia.

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Chong-Wah Ngo,et al.  Video summarization and scene detection by graph modeling , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Yan Liu,et al.  Rushes video summarization using audio-visual information and sequence alignment , 2008, TVS '08.

[7]  Thomas Sikora,et al.  A generic approach for motion-based video parsing , 2007, 2007 15th European Signal Processing Conference.

[8]  Wei-Ta Chu,et al.  Video Copy Detection Based on Bag of Trajectory and Two-Level Approximate Sequence Matching , 2010 .

[9]  Tao Mei,et al.  Near-lossless semantic video summarization and its applications to video analysis , 2013, TOMCCAP.

[10]  J.-Y. Bouguet,et al.  Pyramidal implementation of the lucas kanade feature tracker , 1999 .

[11]  Mateu Sbert,et al.  Selecting Video Key Frames Based on Relative Entropy and the Extreme Studentized Deviate Test , 2016, Entropy.

[12]  Fernando Díaz-de-María,et al.  Temporal segmentation and keyframe selection methods for user-generated video search-based annotation , 2015, Expert Syst. Appl..

[13]  Onni Ojutkangas,et al.  Location Based Abstraction of User Generated Mobile Videos , 2011, MobiMedia.

[14]  Zygmunt Pizlo,et al.  Camera Motion-Based Analysis of User Generated Video , 2010, IEEE Transactions on Multimedia.

[15]  Hyung-Myung Kim,et al.  Efficient camera motion characterization for MPEG video indexing , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[16]  Noel E. O'Connor,et al.  Identifying an efficient and robust sub-shot segmentation method for home movie summarisation , 2010, 2010 10th International Conference on Intelligent Systems Design and Applications.

[17]  Yung-Yu Chuang,et al.  NTU TRECVID-2007 fast rushes summarization system , 2007, TVS '07.

[18]  Jenny Benois-Pineau,et al.  Hierarchical Hidden Markov Model in detecting activities of daily living in wearable videos for studies of dementia , 2011, Multimedia Tools and Applications.

[19]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[20]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Noel E. O'Connor,et al.  An interactive and multi-level framework for summarising user generated videos , 2009, ACM Multimedia.

[22]  Roger Zimmermann,et al.  Motch: an automatic motion type characterization system for sensor-rich videos , 2012, ACM Multimedia.

[23]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[24]  HongJiang Zhang,et al.  A novel motion-based representation for video mining , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[25]  Noboru Babaguchi,et al.  [Invited Paper] Content Analysis for Home Videos , 2013 .

[26]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[27]  Alan F. Smeaton,et al.  Automatic Summarization of Rushes Video Using Bipartite Graphs , 2008, SAMT.

[28]  Jenny Benois-Pineau,et al.  Motion Estimation in Colour Image Sequences , 2013 .

[29]  Noboru Babaguchi,et al.  Content Analysis for Home Videos , 2013 .

[30]  Alan F. Smeaton,et al.  Rushes video summarization using a collaborative approach , 2008, TVS '08.