Automatic Rush Generation with Application to Theatre Performances. (Généation Automatique de Prises de Vues Cinématographiques avec Applications aux Captations de Théâtre)

Professional quality videos of live staged performances are created by recording them from different appropriate viewpoints. These are then edited together to portray an eloquent story replete with the ability to draw out the intended emotion from the viewers. Creating such competent videos, involves the combination of multiple high quality cameras and skilled camera operators. We present a thesis to make even the low budget productions adept and pleasant by producing professional quality vidoes sans a fully and expensively equipped crew of cameramen. A high resolution static camera replaces the plural camera crew and their efficient camera movements are then simulated by virtually panning - tilting - zooming within the original recordings. We show that multiple virtual cameras can be simulated by choosing different trajectories of cropping windows inside the original recording. One of the key novelties of this work is an optimazation framework for computing the virtual camera trajectories using the information extracted from the original video based on computer vision techniques. The actors present on stage are considered as the most important elements of the scene. For the task of localizing and naming actors, we introduce generative models for learning view independent person and costume specific detectors from a set of labeled examples. We explain how to learn the models from a small number of labeled keyframes or video tracks, and how to detect novel appearances of the actors in a maximum likelihood framework. We demonstrate that such actor specific models can accurately localize actors despite changes in view point and occlusions, and significantly improve the detection recall rates over generic detectors. The dissertation then presents an offline algorithm for tracking objects and actors in long video sequences using these actor specific models. Detections are first performed to independently select candidate locations of the actor/object in each frame of the video. The candidate detections are then combined into smooth trajectories in an optimization step minimizing a cost function accounting for false detections and occlusions. Using the actor tracks, we propose a framework for automatically generating multiple clips suitable for video editing by simulating pan-tilt-zoom camera movements within the frame of a single static camera. Our method requires only minimal user input to define the subject matter of each sub-clip. The composition of each sub-clip is automatically computed in a novel L1-norm optimization framework. Our approach encodes several common cinematographic practices into a single convex cost function minimization problem, resulting in aesthetically-pleasing sub-clips which can easily be edited together using off-the-shelf multi-clip video editing software.

[1]  Pascal Fua,et al.  Multicamera People Tracking with a Probabilistic Occupancy Map , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Irfan A. Essa,et al.  Auto-directed video stabilization with robust L1 optimal camera paths , 2011, CVPR 2011.

[3]  Michael Gleicher,et al.  Re-cinematography: Improving the camerawork of casual video , 2008, TOMCCAP.

[4]  Chng Eng Siong,et al.  Automatic composition of broadcast sports video , 2008, Multimedia Systems.

[5]  Pierre Gurdjos,et al.  Be your own cameraman: real-time support for zooming and panning into stored and live panoramic video , 2014, MMSys '14.

[6]  Norman I. Badler,et al.  Intelligent Camera Control Using Behavior Trees , 2011, MIG.

[7]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yasuo Ariki,et al.  Automatic Production System of Soccer Sports Video by Digital Camera Work Based on Situation Recognition , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[9]  John A. Robinson,et al.  Techniques for automated reverse storyboarding , 2005 .

[10]  Pierre Gurdjos,et al.  Interactive Zoom and Panning from Live Panoramic Video , 2014, NOSSDAV 2014.

[11]  Jiawen Chen,et al.  The video mesh: A data structure for image-based three-dimensional video editing , 2011, 2011 IEEE International Conference on Computational Photography (ICCP).

[12]  Yaser Sheikh,et al.  Automatic editing of footage from multiple social cameras , 2014, ACM Trans. Graph..

[13]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[14]  Hironobu Fujiyoshi,et al.  Virtual camerawork for generating lecture video from high resolution images , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[15]  Roy Thompson,et al.  Grammar of the Shot , 1998 .

[16]  Michael Gleicher,et al.  Video retargeting: automating pan and scan , 2006, MM '06.

[17]  A.C. Kokaram,et al.  N-dimensional probability density function transfer and its application to color transfer , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Robert Michael Young,et al.  A Discourse Planning Approach to Cinematic Camera Control for Narratives in Virtual Environments , 2005, AAAI.

[19]  Shinji Ozawa,et al.  Automatic pan control system for broadcasting ball games based on audience's face direction , 2004, MULTIMEDIA '04.

[20]  Erik Reinhard,et al.  Color Transfer between Images , 2001, IEEE Computer Graphics and Applications.

[21]  Rene Kaiser,et al.  The FascinatE Production Scripting Engine , 2012, MMM.

[22]  Brian O'Neill,et al.  Towards intelligent authoring tools for machinima creation , 2009, CHI Extended Abstracts.

[23]  Rik Van de Walle,et al.  Movie script markup language , 2009, DocEng '09.

[24]  Gerald Millerson Video Production Handbook , 1987 .

[25]  Michael Bianchi Automatic video production of lectures using an intelligent and aware environment , 2004, MUM '04.

[26]  Hannes Fassold,et al.  Real-time Person Tracking in High-resolution Panoramic Video for Automated Broadcast Production , 2011, 2011 Conference for Visual Media Production.

[27]  Andrew Zisserman,et al.  Pose search: Retrieving people using their pose , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  John S. Boreczky,et al.  FlySPEC: a multi-user video camera system with hybrid human and automatic control , 2002, MULTIMEDIA '02.

[29]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Rémi Ronfard,et al.  A framework for aligning and indexing movies with their script , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[31]  Robert Michael Young,et al.  Representational Requirements for a Plan Based Approach to Automated Camera Control , 2006, AIIDE.

[32]  Ariel Shamir,et al.  Cropping Scaling Seam carving Warping Multi-operator , 2009 .

[33]  Matthew Brand,et al.  The "Inverse Hollywood Problem": From Video to Scripts and Storyboards via Causal Analysis , 1997, AAAI/IAAI.

[34]  Bernt Schiele,et al.  Robust Object Detection with Interleaved Categorization and Segmentation , 2008, International Journal of Computer Vision.

[35]  Eric Maisel,et al.  Using vanishing points for camera calibration and coarse 3D reconstruction from a single image , 2000, The Visual Computer.

[36]  Luc Van Gool,et al.  Robust tracking-by-detection using a detector confidence particle filter , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Yifei Lu,et al.  Max Margin AND/OR Graph learning for parsing the human body , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Rémi Ronfard,et al.  Detecting and Naming Actors in Movies Using Generative Appearance Models , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[40]  Yael Pritch,et al.  Shift-map image editing , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[41]  Sham M. Kakade,et al.  Leveraging archival video for building face datasets , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[42]  Ernest Callenbach : The Five C's of Cinematography: Motion Picture Filming Techniques Simplified . Joseph V. Mascelli. , 1966 .

[43]  David Bordwell,et al.  On the history of film style , 1997 .

[44]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[45]  Deva Ramanan,et al.  Self-Paced Learning for Long-Term Tracking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  David J. Fleet,et al.  Robust Online Appearance Models for Visual Tracking , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  David Salesin,et al.  Declarative Camera Control for Automatic Cinematography , 1996, AAAI/IAAI, Vol. 1.

[48]  Carlo Tomasi,et al.  Linear time offline tracking and lower envelope algorithms , 2011, 2011 International Conference on Computer Vision.

[49]  David Salesin,et al.  Gaze-based interaction for semi-automatic photo cropping , 2006, CHI.

[50]  Rémi Ronfard Reading movies: an integrated DVD player for browsing movies and their scripts , 2004, MULTIMEDIA '04.

[51]  Daniel Cohen-Or,et al.  Non-homogeneous Content-driven Video-retargeting , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[52]  Christophe De Vleeschouwer,et al.  Detection and recognition of sports(wo)men from multiple views , 2009, 2009 Third ACM/IEEE International Conference on Distributed Smart Cameras (ICDSC).

[53]  Seth Hutchinson,et al.  Visual Servo Control Part I: Basic Approaches , 2006 .

[54]  Rémi Ronfard,et al.  Computational Model of Film Editing for Interactive Storytelling , 2011, ICIDS.

[55]  Peter Carr,et al.  Hybrid robotic/virtual pan-tilt-zom cameras for autonomous event recording , 2013, ACM Multimedia.

[56]  Jianhui Chen,et al.  Autonomous Camera Systems: A Survey , 2014, WICED@AAAI.

[57]  Horst Bischof,et al.  PROST: Parallel robust online simple tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[58]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  RuiYong,et al.  An automated end-to-end lecture capture and broadcasting system , 2008 .

[60]  Michael J. Black,et al.  EigenTracking: Robust Matching and Tracking of Articulated Objects Using a View-Based Representation , 1996, International Journal of Computer Vision.

[61]  Michael Gleicher,et al.  Towards virtual videography (poster session) , 2000, ACM Multimedia.

[62]  Rémi Ronfard,et al.  The Prose Storyboard Language: A Tool for Annotating and Directing Movies , 2015, FDG 2015.

[63]  S. Hutchinson,et al.  Visual Servo Control Part II : Advanced Approaches , 2007 .

[64]  B. S. Manjunath,et al.  Region of interest extraction and virtual camera control based on panoramic video capturing , 2005, IEEE Transactions on Multimedia.

[65]  B. Schiele,et al.  Combined Object Categorization and Segmentation With an Implicit Shape Model , 2004 .

[66]  Nicolas Szilas,et al.  Narrative-driven camera control for cinematic replay of computer games , 2014, MIG.

[67]  Rémi Ronfard,et al.  A survey of vision-based methods for action representation, segmentation and recognition , 2011, Comput. Vis. Image Underst..

[68]  Gustavo Mercado,et al.  The Filmmaker's Eye : Learning (and Breaking) the Rules of Cinematic Composition , 2013 .

[69]  Markus H. Gross,et al.  A system for retargeting of streaming video , 2009, ACM Trans. Graph..

[70]  Anoop Gupta,et al.  Automating camera management for lecture room environments , 2001, CHI.

[71]  S. Avidan,et al.  Seam carving for content-aware image resizing , 2007, SIGGRAPH 2007.

[72]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[73]  Erica Klarreich,et al.  Hello, my name is… , 2014, CACM.

[74]  Yoshinari Kameda,et al.  CARMUL: concurrent automatic recording for multimedia lecture , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[75]  Mitsuho Yamada,et al.  Analysis of the Camerawork of Broadcasting Cameramen , 1997 .

[76]  Michael Gleicher,et al.  Virtual videography , 2007, TOMCCAP.

[77]  Doris Schneider The Art and Craft of Stage Management , 1996 .

[78]  Anoop Gupta,et al.  Automating lecture capture and broadcast: technology and videography , 2004, Multimedia Systems.

[79]  Harry Shum,et al.  Interactive Offline Tracking for Color Objects , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[80]  Rémi Ronfard A Review of Film Editing Techniques for Digital Games , 2012 .

[81]  Patrick Bouthemy,et al.  From video shot clustering to sequence segmentation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[82]  Daniel Cohen-Or,et al.  Feature-aware texturing , 2006, EGSR '06.

[83]  Ming-Hsuan Yang,et al.  Incremental Learning for Robust Visual Tracking , 2008, International Journal of Computer Vision.

[84]  Ariel Shamir,et al.  Improved seam carving for video retargeting , 2008, ACM Trans. Graph..

[85]  Bernd Girod,et al.  An interactive region-of-interest video streaming system for online lecture viewing , 2010, 2010 18th International Packet Video Workshop.

[86]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Rémi Ronfard,et al.  Multi-clip video editing from a single viewpoint , 2014, CVMP.

[88]  Naoki Mukawa,et al.  Impact of video editing based on participants' gaze in multiparty conversation , 2004, CHI EA '04.

[89]  Ralph R. Martin,et al.  Shrinkability Maps for Content‐Aware Video Resizing , 2008, Comput. Graph. Forum.

[90]  Andrew Zisserman,et al.  "Here's looking at you, kid". Detecting people looking at each other in videos , 2011, BMVC.

[91]  Richard Bowden,et al.  Real-Time Upper Body Detection and 3D Pose Estimation in Monoscopic Images , 2006, ECCV.

[92]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[93]  Ben Taskar,et al.  Movie/Script: Alignment and Parsing of Video and Text Transcription , 2008, ECCV.

[94]  Cordelia Schmid,et al.  Learning to Parse Pictures of People , 2002, ECCV.

[95]  Ming-Hsuan Yang,et al.  Visual tracking with online Multiple Instance Learning , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[96]  Aljoscha Smolic,et al.  Computational sports broadcasting: Automated director assistance for live sports , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[97]  Christopher J. Bowen Grammar of the Edit , 1993 .

[98]  A. Laszlo,et al.  Every Frame a Rembrandt: Art and Practice of Cinematography , 2000 .

[99]  Andrew W. Fitzgibbon,et al.  Interactive Feature Tracking using K-D Trees and Dynamic Programming , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[100]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH '06.

[101]  Larry S. Davis,et al.  Multi-camera Tracking and Segmentation of Occluded People on Ground Plane Using Search-Guided Particle Filtering , 2006, ECCV.

[102]  Christian Weissig,et al.  Ultrahigh-Resolution Panoramic Imaging for Format-Agnostic Video Production , 2013, Proceedings of the IEEE.

[103]  Rainer Stiefelhagen,et al.  “Knock! Knock! Who is it?” probabilistic person identification in TV-series , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[104]  Junseok Kwon,et al.  Visual tracking decomposition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[105]  Olga Sorkine-Hornung,et al.  A comparative study of image retargeting , 2010, ACM Trans. Graph..

[106]  Yanxi Liu,et al.  Online selection of discriminative tracking features , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[107]  Alessandro Perina,et al.  Person re-identification by symmetry-driven accumulation of local features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[108]  Pietro Perona,et al.  A sparse object category model for efficient learning and exhaustive recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[109]  Rémi Ronfard,et al.  Film Editing for Third Person Games and Machinima , 2012 .

[110]  Christophe De Vleeschouwer,et al.  Personalized production of basketball videos from multi-sensored data under limited display resolution , 2010, Comput. Vis. Image Underst..

[111]  Richard Szeliski,et al.  Finding People in Repeated Shots of the Same Scene , 2006, BMVC.

[112]  David Salesin,et al.  The virtual cinematographer: a paradigm for automatic real-time camera control and directing , 1996, SIGGRAPH.

[113]  Hujun Bao,et al.  Refilming with Depth-Inferred Videos , 2009, IEEE Transactions on Visualization and Computer Graphics.

[114]  Yishai A. Feldman,et al.  Automated cinematic reasoning about camera behavior , 2006, Expert Syst. Appl..

[115]  Ramakant Nevatia,et al.  Robust Object Tracking by Hierarchical Association of Detection Responses , 2008, ECCV.

[116]  Shen Jinhong,et al.  Intelligent digital filmmaker DMP , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[117]  Claudio S. Pinhanez,et al.  Intelligent Studios: Using Computer Vision to Control TV Cameras , 1995 .

[118]  Pietro Perona,et al.  Viewpoint-invariant learning and detection of human heads , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[119]  Yaser Sheikh,et al.  3D Social Saliency from Head-mounted Cameras , 2012, NIPS.

[120]  Hermann Ney,et al.  Pan, zoom, scan — Time-coherent, trained automatic video cropping , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[121]  Rémi Ronfard,et al.  Continuity Editing for 3D Animation , 2015, AAAI.

[122]  Luc Van Gool,et al.  An adaptive color-based particle filter , 2003, Image Vis. Comput..