Automatic editing of footage from multiple social cameras

We present an approach that takes multiple videos captured by social cameras---cameras that are carried or worn by members of the group involved in an activity---and produces a coherent "cut" video of the activity. Footage from social cameras contains an intimate, personalized view that reflects the part of an event that was of importance to the camera operator (or wearer). We leverage the insight that social cameras share the focus of attention of the people carrying them. We use this insight to determine where the important "content" in a scene is taking place, and use it in conjunction with cinematographic guidelines to select which cameras to cut to and to determine the timing of those cuts. A trellis graph representation is used to optimize an objective function that maximizes coverage of the important content in the scene, while respecting cinematographic guidelines such as the 180-degree rule and avoiding jump cuts. We demonstrate cuts of the videos in various styles and lengths for a number of scenarios, including sports games, street performances, family activities, and social get-togethers. We evaluate our results through an in-depth analysis of the cuts in the resulting videos and through comparison with videos produced by a professional editor and existing commercial solutions.

[1]  Pablo César,et al.  Automatic generation of video narratives from shared UGC , 2011, HT '11.

[2]  Naoki Mukawa,et al.  Impact of video editing based on participants' gaze in multiparty conversation , 2004, CHI EA '04.

[3]  Chong-Wah Ngo,et al.  Video Summarization , 2009, Encyclopedia of Database Systems.

[4]  Romit Roy Choudhury,et al.  MoVi: mobile phone based video highlights via collaborative sensing , 2010, MobiSys '10.

[5]  Hanspeter Pfister,et al.  Multi-video browsing and summarization , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[6]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Richard Szeliski,et al.  Building Rome in a day , 2009, ICCV.

[8]  Yaser Sheikh,et al.  3D Social Saliency from Head-mounted Cameras , 2012, NIPS.

[9]  V. Lepetit,et al.  EPnP: An Accurate O(n) Solution to the PnP Problem , 2009, International Journal of Computer Vision.

[10]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[11]  Anoop Gupta,et al.  Building an intelligent camera management system , 2001, MULTIMEDIA '01.

[12]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Justin Manweiler,et al.  FOCUS: clustering crowdsourced videos by line-of-sight , 2013, SenSys '13.

[14]  Michael Gleicher,et al.  Re-cinematography: improving the camera dynamics of casual video , 2007, ACM Multimedia.

[15]  Michael Gleicher,et al.  Virtual videography , 2006, MM '06.

[16]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[17]  Mauro Barbieri,et al.  Video summarization: methods and landscape , 2003, SPIE ITCom.

[18]  Steven M. Seitz,et al.  Photo tourism: exploring photo collections in 3D , 2006, ACM Trans. Graph..

[19]  Shitala Prasad,et al.  Sports Video Summarization using Priority Curve Algorithm , 2010 .

[20]  E. Dmytryk On Film Editing: An Introduction to the Art of Film Construction , 1984 .

[21]  Lawrence A. Rowe,et al.  Virtual director: automating a webcast , 2001, IS&T/SPIE Electronic Imaging.

[22]  Katsumi Tanaka,et al.  Skimming Multiple Perspective Video Using Tempo-Spatial Importance Measures , 2000, VDB.

[23]  Irfan A. Essa,et al.  Motion fields to predict play evolution in dynamic sport scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[24]  Stanislav Sumec Multi Camera Automatic Video Editing , 2004, ICCVG.

[25]  David Salesin,et al.  The virtual cinematographer: a paradigm for automatic real-time camera control and directing , 1996, SIGGRAPH.

[26]  ShamirAriel,et al.  Automatic editing of footage from multiple social cameras , 2014 .

[27]  Michael Gleicher,et al.  Effective Replays and Summarization of Virtual Experiences , 2012, IEEE Transactions on Visualization and Computer Graphics.

[28]  Noah Wardrip-Fruin,et al.  First Person: New Media As Story, Performance, And Game , 2004 .

[29]  Moncef Gabbouj,et al.  Multimodal Semantics Extraction from User-Generated Videos , 2012, Adv. Multim..

[30]  Marc Pollefeys,et al.  Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, SIGGRAPH 2010.

[31]  John Hart,et al.  ACM Transactions on Graphics , 2004, SIGGRAPH 2004.

[32]  Yael Moses,et al.  Video Synchronization Using Temporal Signals from Epipolar Lines , 2010, ECCV.

[33]  Todd Randall Reed,et al.  Digital image sequence processing, compression, and analysis , 2004 .

[34]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[36]  Moncef Gabbouj,et al.  Sensor-Based Analysis of User Generated Video for Multi-camera Video Remixing , 2012, MMM.

[37]  Wilmot Li,et al.  Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..