An Algorithm for Phase Manipulation in a Speech Signal

While human auditory system is predominantly sensitive to the amplitude spectrum of an incoming sound, a number of sound perception studies have shown that the phase spectrum is also perceptually relevant. In case of speech, its relevance can be established through experiments with speech vocoding or parametric speech synthesis, where particular ways of manipulating the phase of voiced excitation (i.e. setting it to zero or random values) can be shown to affect voice quality. In such experiments the phase should be manipulated with as little distortion of the amplitude spectrum as possible, lest the degradation in voice quality perceived through listening tests, caused by the distortion of amplitude spectrum, be incorrectly attributed to the influence of phase. The paper presents an algorithm for phase manipulation of a speech signal, based on inverse filtering, which introduces negligible distortion into the amplitude spectrum, and demonstrates its accuracy on a number of examples.

[1]  H. Helmholtz,et al.  Ueber die Klangfarbe der Vocale , 1859 .

[2]  Heiga Zen,et al.  The HMM-based speech synthesis system (HTS) version 2.0 , 2007, SSW.

[3]  W. Bastiaan Kleijn,et al.  On phase perception in speech , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  M.R. Schroeder,et al.  Models of hearing , 1975, Proceedings of the IEEE.

[5]  Kuldip K. Paliwal,et al.  On the usefulness of STFT phase spectrum in human listening tests , 2005, Speech Commun..

[6]  A.V. Oppenheim,et al.  The importance of phase in signals , 1980, Proceedings of the IEEE.

[7]  Parham Aarabi,et al.  On the importance of phase in human speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[9]  G. S. Ohm Ueber die Definition des Tones, nebst daran geknüpfter Theorie der Sirene und ähnlicher tonbildender Vorrichtungen , 1843 .

[10]  R. Patterson,et al.  A pulse ribbon model of monaural phase perception. , 1987, The Journal of the Acoustical Society of America.

[11]  Hermann Ney,et al.  Using phase spectrum information for improved speech recognition performance , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[12]  Lauri Juvela,et al.  Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis , 2016, Speech Commun..

[13]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[14]  R. Plomp,et al.  Effect of phase on the timbre of complex tones. , 1969, The Journal of the Acoustical Society of America.