Circulant Temporal Encoding for Video Retrieval and Temporal Alignment

We address the problem of specific video event retrieval. Given a query video of a specific event, e.g., a concert of Madonna, the goal is to retrieve other videos of the same event that temporally overlap with the query. Our approach encodes the frame descriptors of a video to jointly represent their appearance and temporal order. It exploits the properties of circulant matrices to efficiently compare the videos in the frequency domain. This offers a significant gain in complexity and accurately localizes the matching parts of videos. The descriptors can be compressed in the frequency domain with a product quantizer adapted to complex numbers. In this case, video retrieval is performed without decompressing the descriptors. We also consider the temporal alignment of a set of videos. We exploit the matching confidence and an estimate of the temporal offset computed for all pairs of videos by our retrieval approach. Our robust algorithm aligns the videos on a global timeline by maximizing the set of temporally consistent matches. The global temporal alignment enables synchronous playback of the videos of a given scene.

[1]  Cordelia Schmid,et al.  Event Retrieval in Large Video Collections with Circulant Temporal Encoding , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  B. V. K. Vijaya Kumar,et al.  Correlation Pattern Recognition , 2002 .

[3]  Fernando De la Torre,et al.  Unsupervised Temporal Commonality Discovery , 2012, ECCV.

[4]  Jason J. Corso,et al.  Coaction discovery: segmentation of common actions across multiple videos , 2012, MDMKDD '12.

[5]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[6]  M. Pollefeys,et al.  Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, ACM Trans. Graph..

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Jérémie Allard,et al.  Multicamera Real-Time 3D Modeling for Telepresence and Remote Collaboration , 2010, Int. J. Digit. Multim. Broadcast..

[9]  Parham Aarabi,et al.  Tiny Videos: A Large Data Set for Nonparametric Video Retrieval and Frame Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[11]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[12]  Fernando De la Torre,et al.  Maximum Margin Temporal Clustering , 2012, AISTATS.

[13]  Ton Kalker,et al.  Video watermarking system for broadcast monitoring , 1999, Electronic Imaging.

[14]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[16]  Georges Quénot,et al.  TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[17]  Bruce A. Draper,et al.  Visual object tracking using adaptive correlation filters , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Thomas Serre,et al.  HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[19]  Edmond Boyer,et al.  Efficient Polyhedral Modeling from Silhouettes , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[21]  Cordelia Schmid,et al.  Action Recognition with Improved Trajectories , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[23]  Luc Van Gool,et al.  Synchronizing video sequences , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Rui Caseiro,et al.  Exploiting the Circulant Structure of Tracking-by-Detection with Kernels , 2012, ECCV.

[25]  Chong-Wah Ngo,et al.  Practical elimination of near-duplicates from web video search , 2007, ACM Multimedia.

[26]  Hans-Peter Seidel,et al.  Markerless Motion Capture with unsynchronized moving cameras , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Lisa M. Brown,et al.  A survey of image registration techniques , 1992, CSUR.

[28]  Cordelia Schmid,et al.  Compact Video Description for Copy Detection with Precise Temporal Alignment , 2010, ECCV.

[29]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[30]  Mor Naaman,et al.  Less talk, more rock: automated organization of community-contributed collections of concert videos , 2009, WWW '09.

[31]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[32]  Christian Bauckhage,et al.  Efficient Subframe Video Alignment Using Short Descriptors , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Mei-Chen Yeh,et al.  Video copy detection by fast sequence matching , 2009, CIVR '09.

[34]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[35]  Patrick Gros,et al.  Hamming embedding similarity-based image classification , 2012, ICMR.

[36]  Rui Caseiro,et al.  Beyond Hard Negative Mining: Efficient Detector Learning via Block-Circulant Decomposition , 2013, 2013 IEEE International Conference on Computer Vision.

[37]  Markus H. Gross,et al.  VideoSnapping , 2014 .

[38]  Cordelia Schmid,et al.  Action and Event Recognition with Fisher Vectors on a Compact Feature Set , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Hujun Bao,et al.  3D Reconstruction of Dynamic Scenes with Multiple Handheld Cameras , 2012, ECCV.

[41]  M. Irani,et al.  Spatio-Temporal Alignment of Sequences , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  François Fleuret,et al.  Exact Acceleration of Linear Object Detectors , 2012, ECCV.

[43]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.