On the perceptually irrelevant phase information in sinusoidal representation of speech

For efficient quantization of speech representations, it is essential to incorporate perceptual characteristics of human hearing. However, the focus has been confined only to the magnitude information of speech, and little attention has been paid to phase information. This paper presents a novel approach, termed perceptually irrelevant phase elimination (PIPE), to find out irrelevant phase information in acoustic signals in terms of perceived quality. The proposed method, inspired by the observation that the relative phase relationship within a critical band is perceptually important, is derived not only for stationary Fourier signals but also for harmonic signals. For harmonic signals, the "critical phase frequency" is defined below which phase information is perceptually irrelevant. The PIPE algorithm is incorporated into the harmonic analysis/synthesis of speech, and subjective test results demonstrate the effectiveness of the proposed method.

[1]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[2]  Sang Ryong Kim,et al.  A spectrally mixed excitation (SMX) vocoder with robust parameter determination , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[4]  H. Levitt Transformed up-down methods in psychoacoustics. , 1971, The Journal of the Acoustical Society of America.

[5]  W. Bastiaan Kleijn,et al.  On phase perception in speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  R. Patterson,et al.  A pulse ribbon model of monaural phase perception. , 1987, The Journal of the Acoustical Society of America.

[7]  Malcolm Slaney,et al.  An Efficient Implementation of the Patterson-Holdsworth Auditory Filter Bank , 1997 .

[8]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[9]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[10]  A. Rosenberg Effect of glottal pulse shape on the quality of natural vowels. , 1969, The Journal of the Acoustical Society of America.

[11]  Manfred R. Schroeder New Results Concerning Monaural Phase Sensitivity , 1959 .

[12]  W. Bastiaan Kleijn,et al.  Audibility of pitch-synchronously modulated noise , 1997, 1997 IEEE Workshop on Speech Coding for Telecommunications Proceedings. Back to Basics: Attacking Fundamental Problems in Speech Coding.

[13]  R. Plomp,et al.  Effect of phase on the timbre of complex tones. , 1969, The Journal of the Acoustical Society of America.