3D information coding

There are several technologies available for the transmission and presentation of 3D information to the human user. The appropriate selection of technology depends to a large extent on the application as well as on the maturity of the required technology. 3D video was established in niche markets, including professional applications (e.g., scientific visualization) and entertainment (IMAX cinemas, 3D gaming) some time ago [1] and since 2008, it became main stream in digital cinemas with the consumer market getting ready for 3D video by introducing stereo TVs and related technology. Depending on the application, different presentations and data formats are required. For scientific visualization, 3D data formats are used. The 3D data is rendered at the server or it is transmitted to the receiver or client which renders the appropriate view as determined by the user. The correct display of the objects on monoscopic and stereoscopic displays is possible. 3D games typically require the 3D data at the client in order to enable low-latency interaction with the data. The most common 3D data sets consist of 3D geometry and 2D texture data. In case a 3D object is recorded from all allowed viewing angles, this dataset of images can be used for image-based rendering where always one of the recorded images is shown on the display. Image-based rendering may be combined with 3D geometry and animation[2]. The estimation of 3D geometry of natural scenes is a challenging task [3]. Therefore most applications rely on synthetic or manually created 3D data. For presentation of 3D movies, stereo displays requiring glasses to separate left and right views as well as autostereoscopic displays requiring no glasses exist. Independent of the viewer position and orientation, a stereo display presents one view to the left eye and one view to the right eye. These displays just need to receive two video streams typically recorded with a stereo camera. MPEG developed video coding standards to support these displays. The latest standard is MVC, an extension of AVC for coding of stereo sequences using frame reordering [4]. Due to the sudden popularity of stereoscopic video, service providers face the challenge of transmitting stereo using the existing video distribution and presentation infrastructure. One solution is referred to as Frame-packing (FP) because both video frames are copied typically side by side into one frame prior to encoding. Hence, the resulting stereo sequence has only half of the regular resolution. The set-top box decodes FP into one regular frame and signals FP to the display which upconverts each half of the decoded frame to full size for the left and right view. In MPEG-2 FP is signaled as frame packing using private data (USA, Japan) or a newly developed extension in the MPEG-2 Transport Stream. AVC, which is mainly used for HDTV distribution, uses SEI messages to signal FP. MPEG is currently working on an extension to AVC that enhances FP to full resolution stereo while still keeping the compatibility to AVC with FP. In the real world, the view of a scene depends on the head position and orientation of the viewer. Hence, the images presented to the two eyes should be changed depending on the eye position. To a limited extend, auto-stereoscopic displays support this feature by displaying simultaneously several views with a lens in front of the screen assuring that always two appropriate views are visible to the eyes of a viewer [4]. Starting with subjective tests in 2011, MPEG targets this 3DV application by coding several video streams and a depth map of the scene enabling the display to create the appropriate views by means of a view synthesis algorithm. The main challenges are the reliable estimation of a depth map, which may be manually supported for non-real-time applications, as well as the view-synthesis algorithm.