论文信息 - Extracting Regular FOV Shots from 360 Event Footage

Extracting Regular FOV Shots from 360 Event Footage

Video summaries are a popular way to share important events, but creating good summaries is hard. It requires expertise in both capturing and editing footage. While hiring a professional videographer is possible, this is too costly for most casual events. An alternative is to place 360 video cameras around an event space to capture footage passively and then extract regular field-of-view (RFOV) shots for the summary. This paper focuses on the problem of extracting such RFOV shots. Since we cannot actively control the cameras or the scene, it is hard to create "ideal' shots that adhere strictly to traditional cinematography rules. To better understand the tradeoffs, we study human preferences for static and moving camera RFOV shots generated from 360 footage. From the findings, we derive design guidelines. As a secondary contribution, we use these guidelines to develop automatic algorithms that we demonstrate in a prototype user interface for extracting RFOV shots from 360 videos.

[1] Michael Bianchi. Automatic video production of lectures using an intelligent and aware environment , 2004, MUM '04.

[2] Ming-Yu Liu,et al. Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Michael Gleicher,et al. Video retargeting: automating pan and scan , 2006, MM '06.

[4] Yi Yang,et al. Weakly Supervised Photo Cropping , 2014, IEEE Transactions on Multimedia.

[5] Adam Finkelstein,et al. Finding distractors in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Radomír Mech,et al. Automatic Image Cropping using Visual Composition, Boundary Simplicity and Content Preservation Models , 2014, ACM Multimedia.

[7] Ming-Hsuan Yang,et al. Semantic-Driven Generation of Hyperlapse from 360 Degree Video , 2018, IEEE Transactions on Visualization and Computer Graphics.

[8] David Salesin,et al. The virtual cinematographer: a paradigm for automatic real-time camera control and directing , 1996, SIGGRAPH.

[9] Peter Carr,et al. Hybrid robotic/virtual pan-tilt-zom cameras for autonomous event recording , 2013, ACM Multimedia.

[10] Yaser Sheikh,et al. Gaze-Driven Video Re-Editing , 2015, TOGS.

[11] Michael Gleicher,et al. Virtual videography , 2007, TOMCCAP.

[12] Radomír Mech,et al. Unconstrained Salient Object Detection via Proposal Subset Optimization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Hermann Ney,et al. Pan, zoom, scan — Time-coherent, trained automatic video cropping , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Björn Hartmann,et al. Shot Orientation Controls for Interactive Cinematography with 360 Video , 2017, UIST.

[15] Ersin Yumer,et al. Learning to predict indoor illumination from a single image , 2017, ACM Trans. Graph..

[16] Kristen Grauman,et al. Making 360° Video Watchable in 2D: Learning Videography for Click Free Viewing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Ming-Hsuan Yang,et al. Semantic-driven Generation of Hyperlapse from 360° Video , 2017, ArXiv.

[18] Yaser Sheikh,et al. Automatic editing of footage from multiple social cameras , 2014, ACM Trans. Graph..

[19] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Gang Hua,et al. A convolutional neural network cascade for face detection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] D. Arijon,et al. Grammar of Film Language , 1976 .

[22] Rémi Ronfard,et al. Multi-clip video editing from a single viewpoint , 2014, CVMP.

[23] Varun Ramakrishna,et al. Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Simon Lucey,et al. Face alignment through subspace constrained mean-shifts , 2009, 2009 IEEE 12th International Conference on Computer Vision.