Voice conversion between UK and US accented English

This paper presents an HMM-based method and experimental results for voice conversion between UK and US accented English. Phonetic-tree based tiedstate triphone HMMs are used to map equivalent states of the source and target spectra. Then a linear transformation method is incorporated to estimate the most likely target spectra for a given input. The mapping is between two different sets of phoneme i.e. the 44-phoneme UK English BEEP phone set and 39phoneme US CMU phone set. Finally, a prosody adaptation is applied to tune the prosodic parameters. The experiments are based on voice conversion between speakers speaking different unrestricted texts. Acoustic-phonetic mapping between two different accents database enables us to attempt to deconstruct accents to investigate how they are distributed among different parameters such as spectra, energy contour, pitch, and duration.

[1]  Alex Acero,et al.  Automatic generation of synthesis units for trainable text-to-speech systems , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[2]  Levent M. Arslan,et al.  Speaker transformation using sentence HMM based alignments and detailed prosody modification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Yannis Stylianou,et al.  A system for voice conversion based on probabilistic classification and a harmonic plus noise model , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Satoshi Nakamura,et al.  Voice conversion through vector quantization , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.