Very low bit rate speech coding based on HMM with speaker adaptation

This paper discusses the speaker adaptation method in the HMM-based phonetic vocoder, which is a very low bit rate speech coding system, based on speech recognition and speech synthesis using HMM. In the HMM-based phonetic vocoder, the speech quality is governed entirely by the speech synthesis HMM in the decoder. Consequently, in order to adapt to an unspecified input speaker, the HMM of the decoder must be adapted to the input speech. Consequently, this paper proposes the following adaptation to input speech. The HMM is matched by speech recognition to the input parameter sequence. Then, the mean vector of the HMM output distribution sequence is translated uniformly in the parameter space for each segment. The quantity expressing the translation is called the translation vector in this paper. The encoder determines the translation vector, which is then quantized and transmitted. A subjective evaluation experiment shows that when the translation vector is quantized by the proposed method at approximately 100 bitsss and speaker-independent HMM is adapted using the translation vector, almost the same speech quality is obtained as when the speaker-dependent model is trained by the speech data of the input speaker. © 2006 Wiley Periodicals, Inc. Syst Comp Jpn, 37(2): 67–78, 2006; Published online in Wiley InterScience (). DOI 10.1002sscj.10503

[1]  Richard V. Cox,et al.  TTS based very low bit rate speech coder , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[2]  Richard M. Schwartz,et al.  A segment vocoder at 150 b/s , 1983, ICASSP.

[3]  Richard M. Schwartz,et al.  A preliminary design of a phonetic vocoder based on a diphone model , 1980, ICASSP.

[4]  Masaaki Honda,et al.  LPC speech coding based on variable-length segment quantization , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  George R. Doddington,et al.  A phonetic vocoder , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  Thomas P. Barnwell,et al.  MCCREE AND BARNWELL MIXED EXCITAmON LPC VOCODER MODEL LPC SYNTHESIS FILTER 243 SYNTHESIZED SPEECH-PERIODIC PULSE TRAIN-1 PERIODIC POSITION JITTER PULSE 4 , 2004 .

[7]  Gérard Chollet,et al.  Quantization of spectral sequences using variable length spectral segments for speech coding at very low bit rate , 1997, EUROSPEECH.

[8]  Frank K. Soong A phonetically labeled acoustic segment (PLAS) approach to speech analysis-synthesis , 1989, International Conference on Acoustics, Speech, and Signal Processing,.