3D Interest Maps From Simultaneous Video Recordings

We consider an emerging situation where multiple cameras are filming the same event simultaneously from a diverse set of angles. The captured videos provide us with the multiple view geometry and an understanding of the 3D structure of the scene. We further extend this understanding by introducing the concept of 3D interest map in this paper. As most users naturally film what they find interesting from their respective viewpoints, the 3D structure can be annotated with the level of interest, naturally crowdsourced from the users. A 3D interest map can be understood as an extension of saliency maps in the 3D space that captures the semantics of the scene. We evaluate the idea of 3D interest maps on two real datasets, taken from the environment or the cameras that are equipped enough to have an estimation of the poses of cameras and a reasonable synchronization between them. We study two aspects of the 3D interest maps in our evaluation. First, by projecting them into 2D, we compare them to state-of-the-art saliency maps. Second, to demonstrate the usefulness of the 3D interest maps, we apply them to a video mashup system that automatically produces an edited video from one of the datasets.

[1]  Wolfgang Effelsberg,et al.  Saliency detection for stereoscopic video , 2013, MMSys.

[2]  Yung-Yu Chuang,et al.  A collaborative benchmark for region of interest detection algorithms , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Michael Gleicher,et al.  Video retargeting: automating pan and scan , 2006, MM '06.

[4]  Wei Tsang Ooi,et al.  The jiku mobile video dataset , 2013, MMSys.

[5]  Torsten Sattler,et al.  Improving Image-Based Localization by Active Correspondence Search , 2012, ECCV.

[6]  Ali Borji,et al.  State-of-the-Art in Visual Attention Modeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Ramesh Raskar,et al.  CrowdCam: Instantaneous Navigation of Crowd Images Using Angled Graph , 2013, 2013 International Conference on 3D Vision.

[8]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[9]  Lihi Zelnik-Manor,et al.  Context-Aware Saliency Detection , 2012, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Peter F. Sturm,et al.  Algorithms for plane-based pose estimation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Wei Tsang Ooi,et al.  Combining content-based analysis and crowdsourcing to improve user interaction with zoomable video , 2011, ACM Multimedia.

[13]  Wei Tsang Ooi,et al.  MoViMash: online mobile video mashup , 2012, ACM Multimedia.

[14]  Wei Tsang Ooi,et al.  Jiku director 2.0: a mobile video mashup system with zoom and pan using motion maps , 2014, ACM Multimedia.

[15]  Jean-Philippe Tardif,et al.  Non-iterative approach for fast and accurate vanishing point detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Hans Weda,et al.  Synchronization of Multiple Camera Videos Using Audio-Visual Features , 2010, IEEE Transactions on Multimedia.

[17]  Majid Nili Ahmadabadi,et al.  Cost-sensitive learning of top-down modulation for attentional control , 2009, Machine Vision and Applications.

[18]  Derek Hoiem,et al.  Computer vision for music identification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Wei Tsang Ooi,et al.  Crowdsourced automatic zoom and scroll for video retargeting , 2010, ACM Multimedia.

[20]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[21]  Xing Xie,et al.  Learning user interest for image browsing on small-form-factor devices , 2005, CHI.

[22]  Jorge Dias,et al.  Vision and Inertial Sensor Cooperation Using Gravity as a Vertical Reference , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Jean Ponce,et al.  Accurate, Dense, and Robust Multiview Stereopsis , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[25]  Ian D. Reid,et al.  Single View Metrology , 2000, International Journal of Computer Vision.

[26]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[27]  Marc Pollefeys,et al.  Live Metric 3D Reconstruction on Mobile Phones , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  François Chaumette,et al.  Visual Data Fusion for Objects Localization by Active Vision , 2002, ECCV.

[29]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[30]  Wei Tsang Ooi,et al.  Jiku director: a mobile video mashup system , 2013, MM '13.

[31]  Joni-Kristian Kämäräinen,et al.  Projector Calibration by "Inverse Camera Calibration" , 2011, SCIA.