Blocking artifacts in speech/audio: Dynamic auditory model-based characterization and optimal time-frequency smoothing

We revisit the problem of blocking artifacts and their suppression in generic frame-based speech/audio applications. We provide a perceptual characterization of the artifacts by using dynamic auditory models. We propose some short-time-Fourier-transform-based magnitude and phase smoothing techniques and show that localized time-frequency smoothing suppresses the artifacts to a large extent. Our experiments show that magnitude smoothing is superior to phase smoothing and that the latter turns out to be only detrimental to the signal quality. We provide some examples on natural speech and audio signals in the context of compression.

[1]  Nuggehally Sampath Jayant ADPCM Coding of speech with backward-adaptive algorithms for noise feedback and postfiltering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Man Mohan Sondhi,et al.  Enhancement of ADPCM speech coding with backward-adaptive algorithms for postfiltering and noise feedback , 1988, IEEE J. Sel. Areas Commun..

[3]  Michael G. Strintzis,et al.  Blocking artifact detection and reduction in compressed data , 2002, IEEE Trans. Circuits Syst. Video Technol..

[4]  W. Bastiaan Kleijn,et al.  Noise-dependent postfiltering , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Peter Kabal,et al.  Bit allocation algorithms for frequency and time spread perceptual coding , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[7]  J R Cohen,et al.  Application of an auditory model to speech recognition. , 1989, The Journal of the Acoustical Society of America.

[8]  Richard M. Schwartz,et al.  A segment vocoder at 150 b/s , 1983, ICASSP.

[9]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[10]  Henrique S. Malvar,et al.  The LOT: transform coding without blocking effects , 1989, IEEE Trans. Acoust. Speech Signal Process..

[11]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.

[12]  Nathalie Virag,et al.  Single channel speech enhancement based on masking properties of the human auditory system , 1999, IEEE Trans. Speech Audio Process..

[13]  Jani Lainema,et al.  Adaptive deblocking filter , 2003, IEEE Trans. Circuits Syst. Video Technol..

[14]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[15]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[16]  Yung Lyul Lee,et al.  A postprocessing method for reducing quantization effects in low bit-rate moving picture coding , 1999, IEEE Trans. Circuits Syst. Video Technol..

[17]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[18]  Tao Chen,et al.  Adaptive postfiltering of transform coefficients for the reduction of blocking artifacts , 2001, IEEE Trans. Circuits Syst. Video Technol..

[19]  Simon J. Godsill,et al.  Digital audio restoration , 1998 .

[20]  Henrique S. Malvar,et al.  Signal processing with lapped transforms , 1992 .

[21]  Gérard Chollet,et al.  Segmental vocoder-going beyond the phonetic approach , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[22]  Mary L. Comer Efficient reduction of block artifacts in reduced resolution update video coding , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[24]  Abeer Alwan,et al.  A model of dynamic auditory perception and its application to robust word recognition , 1997, IEEE Trans. Speech Audio Process..

[25]  D. J. Zarkadis,et al.  A 16 kb/s APC system with adaptive postfilter and evaluation of its performance , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[26]  J Tchorz,et al.  A model of auditory perception as front end for automatic speech recognition. , 1999, The Journal of the Acoustical Society of America.

[27]  Rabab Kreidieh Ward,et al.  Removing the blocking artifacts of block-based DCT compressed images , 2003, IEEE Trans. Image Process..