Analysis-by-synthesis method for whisper-speech reconstruction

In the following paper, a method for the real-time conversion of whispers to normal phonated speech through a code excited linear prediction analysis-by-synthesis codec is discussed. This approach uses a template of a speakerpsilas normal phonated speech for extraction of excitation parameters such as pitch and gain, and then injects these estimated excitations into whispered signal to synthesize normal-sounding speech through the CELP codec. Furthermore, since restoring pitch to whispered speech requires some considerations of quality and accuracy, spectral enhancements are required in terms of formant shifting (LSPs modification) and pitch injection based on voiced/unvoiced decision. Spectral shifting is accomplished through line-spectral pair adjustment. Implementing such methods by using the popular CELP codec allows integration of the technique with any modern speech applications and devices. Subjective testing results are presented to determine the effectiveness of the technique.

[1]  H. Traunmüller,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Comparative Study of the Male and Female Whispered and Phonated Versions of the Long Vowels of Swedish , 2022 .

[2]  Mark A. Clements,et al.  Estimation of speech spectra from whispers , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Ian Vince McLoughlin,et al.  Line spectral pairs , 2008, Signal Process..

[4]  Stanley J. Wenndt,et al.  A study on the classification of whispered and normally phonated speech , 2002, INTERSPEECH.

[5]  Bishnu S. Atal,et al.  Predictive Coding of Speech at Low Bit Rates , 1982, IEEE Trans. Commun..

[6]  Kazuya Takeda,et al.  Analysis and recognition of whispered speech , 2005, Speech Commun..

[7]  J. N. Holmes,et al.  Acoustic correlates of intonation in whispered speech , 1983 .

[8]  Ian McLoughlin,et al.  LSP-based speech modification for intelligibility enhancement , 1997, Proceedings of 13th International Conference on Digital Signal Processing.

[9]  A. Goalic,et al.  An intrinsically reliable and fast algorithm to compute the line spectrum pairs (LSP) in low bit rate CELP coding , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[11]  K. Kallail,et al.  Formant-frequency differences between isolated whispered and phonated vowel samples produced by adult female subjects. , 1984, Journal of speech and hearing research.