Development of the MPEG-H TV Audio System for ATSC 3.0

A new TV audio system based on the MPEG-H 3D audio standard has been designed, tested, and implemented for ATSC 3.0 broadcasting. The system offers immersive sound to increase the realism and immersion of programming, and offers audio objects that enable interactivity or personalization by viewers. Immersive sound may be broadcast using loudspeaker channel-based signals or scene-based components in combination with static or dynamic audio objects. Interactivity can be enabled through broadcaster-authored preset mixes or through user control of object gains and positions. Improved loudness and dynamic range control allows tailoring the sound for best reproduction on a variety of consumer devices and listening environments. The system includes features to allow operation in HD-SDI broadcast plants, storage, and editing of complex audio programs on existing video editor software or digital audio workstations, frame-accurate switching of programs, and new technologies to adapt current mixing consoles for live broadcast production of immersive and interactive sound. Field tests at live broadcast events were conducted during system design and a live demonstration test bed was constructed to prove the viability of the system design. The system also includes receiver-side components to enable interactivity, binaural rendering for headphone, or tablet computer listening, a “3D soundbar” for immersive playback without overhead speakers, and transport over HDMI 1.4 connections in consumer equipment. The system has been selected as a proposed standard of ATSC 3.0 and is the sole audio system of the UHD ATSC 3.0 broadcasting service currently being deployed in South Korea.

[1]  William May,et al.  HTTP Live Streaming , 2017, RFC.

[2]  Nils Günther Peters,et al.  Efficient Compression and Transportation of Scene-Based Audio for Television Broadcast , 2016 .

[3]  Youngkwon Lim,et al.  Delivery of ATSC 3.0 Services With MPEG Media Transport Standard Considering Redistribution in MPEG-2 TS Format , 2016, IEEE Transactions on Broadcasting.

[4]  Vivek K. Goyal,et al.  Theoretical foundations of transform coding , 2001, IEEE Signal Process. Mag..

[5]  Harald Fuchs,et al.  Dialogue Enhancements-technology and experiments , 2009 .

[6]  Sascha Disch,et al.  Temporal Tile Shaping for spectral gap filling in audio transform coding in EVS , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Robert Höldrich,et al.  3D binaural sound reproduction using a virtual ambisonic approach , 2003, IEEE International Symposium on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2003. VECIMS '03. 2003.

[8]  Ville Pulkki,et al.  Virtual Sound Source Positioning Using Vector Base Amplitude Panning , 1997 .

[9]  Mark A. Poletti,et al.  Three-Dimensional Surround Sound Systems Based on Spherical Harmonics , 2005 .

[10]  Jan Plogsties,et al.  Design, Coding and Processing of Metadata for Object-Based Interactive Audio , 2014 .

[11]  Harald Fuchs,et al.  Building the World’s Most Complex TV Network: A Test Bed for Broadcasting Immersive and Interactive Audio , 2016 .

[12]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[13]  Jan Plogsties,et al.  MPEG-H Audio—The New Standard for Universal Spatial / 3D Audio Coding , 2014 .

[14]  Andreas Niedermeier,et al.  Intelligent Gap Filling in Perceptual Transform Coding of Audio , 2016 .

[15]  Thomas Stockhammer,et al.  ROUTE/DASH IP Streaming-Based System for Delivery of Broadcast, Broadband, and Hybrid Services , 2016, IEEE Transactions on Broadcasting.

[16]  B. Rafaely Plane-wave decomposition of the sound field on a sphere by spherical convolution , 2004 .

[17]  Andreas Niedermeier,et al.  Spectral envelope reconstruction via IGF for audio transform coding , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Aaron Heller,et al.  A Toolkit for the Design of Ambisonic Decoders , 2012 .

[19]  Shaowei Xie,et al.  DASH and MMT and Their Applications in ATSC 3 . 0 , 2016 .

[20]  Ulli Scuda,et al.  Producing Interactive Immersive Sound for MPEG-H: A Field Test for Sports Broadcasting , 2014 .

[21]  Mark R. Anderson,et al.  Direct comparison of the impact of head tracking, reverberation, and individualized head-related transfer functions on the spatial perception of a virtual speech source. , 2001, Journal of the Audio Engineering Society. Audio Engineering Society.

[22]  S. Merrill Weiss,et al.  Object-Based Audio: Opportunities for Improved Listening Experience and Increased Listener Involvement , 2014 .

[23]  Franz Zotter,et al.  Comparison of energy-preserving and all-round Ambisonic decoders , 2013 .

[24]  Sascha Dick,et al.  Efficient Multichannel Audio Transform Coding with Low Delay and Complexity , 2016 .

[25]  J. Daniel,et al.  Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia , 2000 .

[26]  Juliane Jung Introduction To Digital Audio Coding And Standards , 2016 .

[27]  David Marston,et al.  The Audio Definition Model—A Flexible Standardized Representation for Next Generation Audio Content in Broadcasting and Beyond , 2016 .

[28]  Ville Pulkki Uniform spreading of amplitude panned virtual sources , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[29]  Andreas Niedermeier,et al.  Low-complexity semi-parametric joint-stereo audio transform coding , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[30]  S. Merrill Weiss,et al.  Scene-Based Audio Implemented with Higher Order Ambisonics (HOA) , 2015 .

[31]  A. Silzle,et al.  Investigation on the Quality of 3 D Sound Reproduction , 2012 .