MPEG Unified Speech and Audio Coding – Bridging the Gap

Speech and audio coding schemes originate from different worlds. Speech coding schemes typically assume a source model i.e. the human vocal tract. General audio coding schemes primarily rely on a sinkmodel i.e. the human auditory system. While speech coding schemes work well for the signal class they were designed for at very low rates, they are known to fail for general audio signals even at higher rates. In contrast, general audio coders work well for any content at higher rates, but typically have limited performance especially for speech signals at very low rates. Recently the ISO/MPEG group started a standardization activity to develop a new Unified Speech and Audio Coding scheme. A state of the art AAC based general audio coder, featuring transform coding, parametric bandwidth extension and parametric stereo coding,was extended by source model coding tools. All codec modules were further improved and revised for enhanced performance in particular at very low bitrates. The new unified coding scheme outperforms dedicated speech and general audio coding schemes and bridges the gap between both worlds. This paper describes the new codec in detail and shows how the goal of consistent high quality for all signal types is reached.

[1]  P. Mabilleau,et al.  16 kbps wideband speech coding technique based on algebraic CELP , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Guillaume Fuchs,et al.  Efficient context adaptive entropy coding for real-time applications , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Roch Lefebvre,et al.  The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[4]  Christof Faller,et al.  Spatial Audio Processing: MPEG Surround and Other Applications , 2007 .

[5]  Philippe Gournay,et al.  Efficient Cross-Fade Windows for Transitions between LPC-Based and Non-LPC Based Audio Coding , 2009 .

[6]  Heiko Purnhagen,et al.  A Closer Look into MPEG-4 High Efficiency AAC , 2003 .

[7]  Sascha Disch,et al.  Efficient transform coding of two-channel audio signals by means of complex-valued stereo prediction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Pasi Ojala,et al.  AMR-WB+: a new audio coding standard for 3rd generation mobile audio services , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  Sascha Disch,et al.  A Time-Warped MDCT Approach to Speech Transform Coding , 2009 .

[10]  Christof Faller,et al.  Spatial Audio Processing , 2007 .

[11]  Sascha Disch,et al.  A harmonic bandwidth extension method for audio codecs , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Philippe Gournay,et al.  A Novel Scheme for Low Bitrate Unified Speech and Audio Coding – MPEG RM0 , 2009 .