Spatial-Random-Access-Enabled Video Coding for Interactive Virtual Pan/Tilt/Zoom Functionality

High-spatial-resolution videos offer the possibility of viewing an arbitrary region-of-interest (RoI) interactively. Zoom functionality enables watching high-resolution content even on displays of lower spatial resolution. If arbitrary regions corresponding to arbitrary zoom factors can be served to the user, the transmission and/or decoding of the entire high-spatial-resolution video can be avoided. Moreover, if the video content can be encoded such that arbitrary RoIs corresponding to different zoom factors can be simply extracted from the compressed bitstream, we can avoid dedicated video encoding for each user. We propose such a video coding scheme that is vital in allowing the system to scale to large numbers of remote users as well as to encode and store the content for subsequent repeated playback. Apart from generating a multi-resolution representation, our coding scheme uses P slices from H.264/AVC. We study the tradeoff in the choice of slice size. A larger slice size enables higher coding efficiency for representing the entire scene but increases the number of pixels that have to be transmitted. The optimal slice size achieves the best tradeoff and minimizes the expected transmission bitrate. Experimental results confirm the optimality of our predicted slice size for various test cases. Furthermore, we propose an improvement based on background extraction and long-term memory motion-compensated prediction. Experiments indicate up to 85% bitrate reduction while retaining efficient random access capability.

[1]  Bernd Girod,et al.  Motion-compensating prediction with fractional-pel accuracy , 1993, IEEE Trans. Commun..

[2]  Walter Bender,et al.  Salient Stills: Process and Practice , 1996, IBM Syst. J..

[3]  T. Wiegand,et al.  REPRESENTATION, CODING AND INTERACTIVE RENDERING OF HIGH- RESOLUTION PANORAMIC IMAGES AND VIDEO USING MPEG-4 , 2005 .

[4]  Bernd Girod,et al.  Pre-fetching based on video analysis for interactive region-of-interest streaming of soccer sequences , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[5]  Marta Karczewicz,et al.  The SP- and SI-frames design for H.264/AVC , 2003, IEEE Trans. Circuits Syst. Video Technol..

[6]  Bernd Girod,et al.  Efficiency analysis of multihypothesis motion-compensated prediction for video coding , 2000, IEEE Trans. Image Process..

[7]  Masayuki Tanimoto Free Viewpoint Television (FTV) , 2007 .

[8]  Peter Eisert,et al.  Creation of High-Resolution Video Panoramas of Sport Events , 2006, Eighth IEEE International Symposium on Multimedia (ISM'06).

[9]  Dietmar Hepper,et al.  Efficiency analysis and application of uncovered background prediction in a low bit rate image coder , 1990, IEEE Trans. Commun..

[10]  Robert Prandolini,et al.  Architecture, philosophy, and performance of JPIP: internet protocol standard for JPEG2000 , 2003, Visual Communications and Image Processing.

[11]  Wolfgang Effelsberg,et al.  Robust background estimation for complex video sequences , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[12]  Eckehard G. Steinbach,et al.  RDTC Optimized Compression of Image-Based Scene Representations (Part II): Practical Coding , 2008, IEEE Transactions on Image Processing.

[13]  Eckehard G. Steinbach,et al.  RDTC Optimized Compression of Image-Based Scene Representations (Part I): Modeling and Theoretical Analysis , 2008, IEEE Transactions on Image Processing.

[14]  Bernd Girod,et al.  Peer-to-peer multicast live video streaming with interactive virtual pan/tilt/zoom functionality , 2008, 2008 15th IEEE International Conference on Image Processing.

[15]  M. Reha Civanlar,et al.  Interactive transport of multi-view videos for 3DTV applications , 2006 .

[16]  Philip A. Chou,et al.  Rate-distortion optimized streaming of packetized media , 2006, IEEE Transactions on Multimedia.

[17]  Civanlar M. Reha,et al.  Interactive transport of multi-view videos for 3DTV applications , 2006 .

[18]  Masayuki Tanimoto Overview of FTV (free-viewpoint television) , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[19]  Bernd Girod,et al.  Optimal server bandwidth allocation for streaming multiple streams via P2P multicast , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[20]  S. B. Kang,et al.  Survey of image-based representations and compression techniques , 2003, IEEE Trans. Circuits Syst. Video Technol..

[21]  Bernd Girod,et al.  Random access for compressed light fields using multiple representations , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[22]  Bernd Girod,et al.  Wyner-Ziv coding of light fields for random access , 2004, IEEE 6th Workshop on Multimedia Signal Processing, 2004..

[23]  Bernd Girod,et al.  Rate-distortion Optimized Streaming of Compressed Light Fields with Multiple Representations , 2004 .

[24]  David S. Taubman,et al.  Rate-distortion optimized interactive browsing of JPEG2000 images , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[25]  Bernd Girod,et al.  The Efficiency of Motion-Compensating Prediction for Hybrid Coding of Video Sequences , 1987, IEEE J. Sel. Areas Commun..

[26]  Michael F. Cohen,et al.  Capturing and viewing gigapixel images , 2007, ACM Trans. Graph..

[27]  Oliver Schreer,et al.  Virtual team user environments - a step from tele-cubicles towards distributed tele-collaboration in mediated workspaces , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[28]  Peter Eisert,et al.  Creation of High-Resolution Video Panoramas for Sport Events , 2007, Int. J. Semantic Comput..

[29]  Aljoscha Smolic,et al.  3DAV exploration of video-based rendering technology in MPEG , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[30]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[31]  Thomas Wiegand,et al.  Long-term memory motion-compensated prediction , 1999, IEEE Trans. Circuits Syst. Video Technol..

[32]  Bernd Girod,et al.  Region-of-interest prediction for interactively streaming regions of high resolution video , 2007, Packet Video 2007.

[33]  Bernd Girod,et al.  Distributed compression for large camera arrays , 2004, IEEE Workshop on Statistical Signal Processing, 2003.