Towards a format-agnostic approach for production, delivery and rendering of immersive media

The media industry is currently being pulled in the often-opposing directions of increased realism (high resolution, stereoscopic, large screen) and personalization (selection and control of content, availability on many devices). We investigate the feasibility of an end-to-end format-agnostic approach to support both these trends. In this paper, different aspects of a format-agnostic capture, production, delivery and rendering system are discussed. At the capture stage, the concept of layered scene representation is introduced, including panoramic video and 3D audio capture. At the analysis stage, a virtual director component is discussed that allows for automatic execution of cinematographic principles, using feature tracking and saliency detection. At the delivery stage, resolution-independent audiovisual transport mechanisms for both managed and unmanaged networks are treated. In the rendering stage, a rendering process that includes the manipulation of audiovisual content to match the connected display and loudspeaker properties is introduced. Different parts of the complete system are revisited demonstrating the requirements and the potential of this advanced concept.

[1]  César D. Salvador Discrete Wave Field Synthesis Using Fractional Order Filters and Fractional Delays , 2010 .

[2]  A. Berkhout,et al.  Acoustic control by wave field synthesis , 1993 .

[3]  Rene Kaiser,et al.  The FascinatE Production Scripting Engine , 2012, MMM.

[4]  Hannes Fassold,et al.  Real-time Person Tracking in High-resolution Panoramic Video for Automated Broadcast Production , 2011, 2011 Conference for Visual Media Production.

[5]  Sascha Spors,et al.  A Comparison of Wave Field Synthesis and Higher-Order Ambisonics with Respect to Physical Properties and Spatial Sampling , 2008 .

[6]  Javier Ruiz Hidalgo,et al.  Real-Time Head and Hand Tracking Based on 2.5D Data , 2012 .

[7]  Cees T. A. M. de Laat,et al.  CineGrid: Super high definition media over optical networks , 2011, Future Gener. Comput. Syst..

[8]  Wei Tsang Ooi,et al.  Supporting zoomable video streams with dynamic region-of-interest cropping , 2010, MMSys '10.

[9]  Montse Pardàs,et al.  Segmentation and tracking of static and moving objects in video surveillance scenarios , 2008, 2008 15th IEEE International Conference on Image Processing.

[10]  Mark A. Poletti,et al.  Three-Dimensional Surround Sound Systems Based on Spherical Harmonics , 2005 .

[11]  Helmut Wittek,et al.  Potential Wavefield Synthesis Applications in the Multichannel Stereophonic World , 2003 .

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Thomas Stockhammer,et al.  Dynamic adaptive streaming over HTTP --: standards and design principles , 2011, MMSys.

[14]  Jerome Daniel,et al.  Further Investigations of High-Order Ambisonics and Wavefield Synthesis for Holophonic Sound Imaging , 2003 .

[15]  Hans Stokking,et al.  Spatial segmentation for immersive media delivery , 2011, 2011 15th International Conference on Intelligence in Next Generation Networks.

[16]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  Yasushige Nakayama,et al.  Wide Listening Area with Exceptional Spatial Sound Quality of a 22.2 Multichannel Sound System , 2007 .

[18]  Jean-François Macq,et al.  Evaluation of bandwidth performance for interactive spherical Video , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[19]  Iraj Sodagar,et al.  The MPEG-DASH Standard for Multimedia Streaming Over the Internet , 2011, IEEE MultiMedia.

[20]  Christophe De Vleeschouwer,et al.  Automatic summarization of broadcasted soccer videos with adaptive fast-forwarding , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[21]  Rene Kaiser,et al.  A Rule-Based Virtual Director Enhancing Group Communication , 2012, 2012 IEEE International Conference on Multimedia and Expo Workshops.

[22]  Christian Weissig,et al.  Ultrahigh-Resolution Video as Basis of a Format-Agnostic Production System , 2011 .

[23]  A. J. Berkhout,et al.  A Holographic Approach to Acoustic Control , 1988 .

[24]  Ian Reid,et al.  fastHOG – a real-time GPU implementation of HOG , 2011 .

[25]  R. van Brandenburg,et al.  Spatial Tiling and Streaming in an Immersive Media Delivery Network , 2011 .

[26]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[27]  M. G. Michalos,et al.  Dynamic Adaptive Streaming over HTTP , 2012 .

[28]  Jens Spille,et al.  FascinatE D5.1.1 AV renderer specification and basic characterisation of audience interaction , 2010 .