Very low bit-rate F0 coding for phonetic vocoder using MSD-HMM with quantized F0 context

This paper presents a very low bit-rate F0 coding technique for speaker-dependent phonetic vocoder based on hidden Markov model (HMM) using quantized F0 context. In the proposed technique, the input F0 sequence is converted into F0 symbol sequence at a phoneme level using scalar quantization. The quantized F0 symbols are used in the decoding process as the prosodic context for the HMM-based speech synthesis. The synthetic speech is generated from the context-dependent labels and input speaker's pre-trained HMMs by using the HMM-based parameter generation algorithm. By taking account account of preceding and succeeding phonemes and F0 symbols as the contextual factors, we can generate smooth F0 trajectory similar to that of the original with only a small number of quantization bits. Experimental results demonstrate that the proposed technique can generate F0 contour with acceptable quality even when the bit-rate is less than 50 bps.

[1]  Keiichi Tokuda,et al.  A very low bit rate speech coder using HMM-based speech recognition/synthesis techniques , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Takashi Nose,et al.  HMM-based speech synthesis with unsupervised labeling of accentual context based on F0 quantization and average voice model , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Keiichi Tokuda,et al.  An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features , 1995, EUROSPEECH.

[4]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[5]  K. Tokuda,et al.  Speech parameter generation from HMM using dynamic features , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Richard V. Cox,et al.  A very low bit rate speech coder based on a recognition/synthesis paradigm , 2001, IEEE Trans. Speech Audio Process..

[7]  Keiichi Tokuda,et al.  A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis , 2007, IEICE Trans. Inf. Syst..

[8]  Koichi Shinoda,et al.  MDL-based context-dependent subword modeling for speech recognition , 2000 .

[9]  Keiichi Tokuda,et al.  Hidden Markov models based on multi-space probability distribution for pitch pattern modeling , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[10]  Heiga Zen,et al.  Improving the performance of HMM-based very low bit rate speech coding , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[11]  Aggelos K. Katsaggelos,et al.  MPEG-4 and rate-distortion-based shape-coding techniques , 1998, Proc. IEEE.

[12]  George R. Doddington,et al.  A phonetic vocoder , 1989, International Conference on Acoustics, Speech, and Signal Processing,.