Binaural cue coding-Part I: psychoacoustic fundamentals and design principles

Binaural Cue Coding (BCC) is a method for multichannel spatial rendering based on one down-mixed audio channel and BCC side information. The BCC side information has a low data rate and it is derived from the multichannel encoder input signal. A natural application of BCC is multichannel audio data rate reduction since only a single down-mixed audio channel needs to be transmitted. An alternative BCC scheme for efficient joint transmission of independent source signals supports flexible spatial rendering at the decoder. This paper (Part I) discusses the most relevant binaural perception phenomena exploited by BCC. Based on that, it presents a psychoacoustically motivated approach for designing a BCC analyzer and synthesizer. This leads to a reference implementation for analysis and synthesis of stereophonic audio signals based on a Cochlear Filter Bank. BCC synthesizer implementations based on the FFT are presented as low-complexity alternatives. A subjective audio quality assessment of these implementations shows the robust performance of BCC for critical speech and audio material. Moreover, the results suggest that the performance given by the reference synthesizer is not significantly compromised when using a low-complexity FFT-based synthesizer. The companion paper (Part II) generalizes BCC analysis and synthesis for multichannel audio and proposes complete BCC schemes including quantization and coding. Part II also describes an alternative BCC scheme with flexible rendering capability at the decoder and proposes several applications for both BCC schemes.

[1]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[2]  Christof Faller,et al.  Efficient representation of spatial audio using perceptual parametrization , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[3]  Eliathamby Ambikairajah,et al.  Auditory filter bank inversion , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[4]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[5]  Joseph L. Hall,et al.  Auditory Psychophysics for Coding Applications , 1999 .

[6]  Jürgen Herre,et al.  Intensity Stereo Coding , 1994 .

[7]  Christof Faller,et al.  Binaural cue coding: a novel and efficient representation of spatial audio , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  A Kohlrausch Auditory filter shape derived from binaural masking experiments. , 1988, The Journal of the Acoustical Society of America.

[9]  Sugato Chakravarty,et al.  Method for the subjective assessment of intermedi-ate quality levels of coding systems , 2001 .

[10]  J. D. Johnston,et al.  Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  B Kollmeier,et al.  Binaural and monaural auditory filter bandwidths and time constants in probe tone detection experiments. , 1998, The Journal of the Acoustical Society of America.

[12]  Vijay K. Madisetti,et al.  The Digital Signal Processing Handbook , 1997 .

[13]  T. Anderson,et al.  Binaural and spatial hearing in real and virtual environments , 1997 .

[14]  Christof Faller,et al.  Binaural cue coding-Part II: Schemes and applications , 2003, IEEE Trans. Speech Audio Process..

[15]  Lucas J. van Vliet,et al.  The digital signal processing handbook , 1998 .

[16]  C Trahiotis,et al.  Binaural detection as a function of interaural correlation and bandwidth of masking noise: implications for estimates of spectral resolution. , 1998, The Journal of the Acoustical Society of America.

[17]  Christof Faller,et al.  Estimation of auditory spatial cues for Binaural Cue Coding , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  J. C. Middlebrooks,et al.  Listener weighting of cues for lateral angle: the duplex theory of sound localization revisited. , 2002, The Journal of the Acoustical Society of America.

[19]  A. Kohlrausch,et al.  Binaural processing model based on contralateral inhibition. I. Model structure. , 2001, The Journal of the Acoustical Society of America.

[20]  S van de Par,et al.  Binaural processing model based on contralateral inhibition. II. Dependence on spectral parameters. , 2001, The Journal of the Acoustical Society of America.

[21]  Matti Karjalainen,et al.  Localization of Amplitude-Panned Virtual Sources I: Stereophonic Panning , 2001 .

[22]  Methods for the subjective assessment of small impairments in audio systems , 2015 .

[23]  John Princen The design of nonuniform modulated filterbanks , 1995, IEEE Trans. Signal Process..

[24]  J. Blauert Spatial Hearing: The Psychophysics of Human Sound Localization , 1983 .

[25]  F. Wightman,et al.  The dominant role of low-frequency interaural time differences in sound localization. , 1992, The Journal of the Acoustical Society of America.

[26]  H. Fuchs Improving joint stereo audio coding by adaptive inter-channel prediction , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[27]  Christof Faller,et al.  Why Binaural Cue Coding is Better than Intensity Stereo Coding , 2002 .

[28]  Christof Faller,et al.  Binaural Cue Coding Applied to Stereo and Multi-Channel Audio Compression , 2002 .

[29]  Christof Faller,et al.  Design and Evaluation of Binaural Cue Coding Schemes , 2002 .

[30]  Christof Faller,et al.  Binaural Cue Coding Applied to Audio Compression with Flexible Rendering , 2002 .

[31]  Frank Baumgarte,et al.  Improved audio coding using a psychoacoustic model based on a cochlear filter bank , 2002, IEEE Trans. Speech Audio Process..