Perceptual phase quantization of speech

It is essential to incorporate perceptual characteristics of human hearing in modern speech/audio coding systems. However, the focus has been confined only to the magnitude information of speech, and little attention has been paid to phase information. A quantitative study on the characteristics of human phase perception is presented and a novel method is proposed for the quantization of phase information in speech/audio signals. First, the just-noticeable difference (JND) of phase for each harmonic in flat-spectrum periodic tones is measured for several different fundamental frequencies. Then, a mathematical model of JND is established, based on measured data, to form a weighting function for phase quantization. Since the proposed weighting function is derived from psychoacoustic measurements, it provides a novel quantization method by which more bits are assigned to perceptually important phase components at the sacrifice of less important ones, resulting in a quantized signal perceptually closer to the original one. Experimental results on five vowel speech signals demonstrate that the proposed weighting function is very effective for the quantization of phase information.

[1]  W. Bastiaan Kleijn,et al.  On phase perception in speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  J M Festen,et al.  Phase effects in a three-component signal. , 1974, The Journal of the Acoustical Society of America.

[3]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[4]  J. L. Goldstein Auditory spectral filtering and monaural phase perception. , 1967, The Journal of the Acoustical Society of America.

[5]  T. Buunen,et al.  On the perception of phase differences in acoustic signals , 1976 .

[6]  M. R. Schroeder,et al.  Monaural Phase Effects for Two‐Tone Signals , 1972 .

[7]  D A Nelson,et al.  Level-dependent critical bandwidth for phase discrimination. , 1994, The Journal of the Acoustical Society of America.

[8]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969 .

[9]  R. Patterson,et al.  A pulse ribbon model of monaural phase perception. , 1987, The Journal of the Acoustical Society of America.

[10]  Doh-Suk Kim Perceptual phase redundancy in speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  W. Bastiaan Kleijn,et al.  Audibility of pitch-synchronously modulated noise , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[12]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[13]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[14]  Doh-Suk Kim On the perceptually irrelevant phase information in sinusoidal representation of speech , 2001, IEEE Trans. Speech Audio Process..

[15]  Peter Kabal,et al.  Narrowband perceptual audio coding: enhancements for speech , 2001, INTERSPEECH.

[16]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[17]  R. C. Mathes,et al.  Phase Effects in Monaural Perception , 1947 .

[18]  Sang Ryong Kim,et al.  A spectrally mixed excitation (SMX) vocoder with robust parameter determination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[20]  P. Noll MPEG Digital Audio Coding Standards , 1999 .