Wideband Speech and Audio Coding in the Perceptual Domain

A new critical band auditory filterbank with superior auditory masking properties is proposed and is applied to wideband speech and audio coding. The analysis and synthesis are performed in the perceptual domain using this filterbank. The outputs of the analysis filters are processed to obtain a series of pulse trains that represent neural firing. Simultaneous and temporal masking models are applied to reduce the number of pulses in order to achieve a compact time-frequency parameterization. The pulse amplitudes and positions are then coded using a run-length coding algorithm. The new speech and audio coder produces high quality coded speech and audio, with both temporal and spectral fidelity.

[1]  A Robert,et al.  A composite model of the auditory periphery for simulating responses to complex sounds. , 1999, The Journal of the Acoustical Society of America.

[2]  E. Zwicker,et al.  Audio engineering and psychoacoustics: matching signals to the final receiver, the human auditory system , 1991 .

[3]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[4]  Richard F. Lyon,et al.  A computational model of filtering, detection, and compression in the cochlea , 1982, ICASSP.

[5]  J. L. Flanagan,et al.  Models for approximating basilar membrane displacement , 1960 .

[6]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[7]  M. Liberman,et al.  Auditory-nerve response from cats raised in a low-noise chamber. , 1978, The Journal of the Acoustical Society of America.

[8]  Eliathamby Ambikairajah,et al.  Log-magnitude modelling of auditory tuning curves , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Gernot Kubin,et al.  On speech coding in a perceptual domain , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Eliathamby Ambikairajah,et al.  Auditory filter bank inversion , 2001, ISCAS 2001. The 2001 IEEE International Symposium on Circuits and Systems (Cat. No.01CH37196).

[11]  W. S. Rhode Observations of the vibration of the basilar membrane in squirrel monkeys using the Mössbauer technique. , 1971, The Journal of the Acoustical Society of America.

[12]  Takao Kobayashi,et al.  Design of IIR digital filters with arbitrary log magnitude function by WLS techniques , 1990, IEEE Trans. Acoust. Speech Signal Process..

[13]  Eliathamby Ambikairajah,et al.  Auditory masking and MPEG-1 audio compression , 1997 .

[14]  Eliathamby Ambikairajah,et al.  Digital filter simulation of the basilar membrane , 1989 .

[15]  Mark Black,et al.  Computationally efficient wavelet packet coding of wide-band stereo audio signals , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[16]  Eliathamby Ambikairajah,et al.  Wideband speech and audio coding using gammatone filter banks , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).