End-to-end Speech Synthesis for Tibetan Lhasa Dialect

Speech synthesis for Tibetan Lhasa dialect is implemented on the basis of an end-to-end novel speech synthesis framework, Tacotron. The training transcript has used the phoneme list transcribed from Tibetan characters, and feature parameters were extracted from the mel-spectrogram. Then the model is trained by the mapping of character to spectrum. Tibetan language is an important minority language of the Chinese nation, but there is little research on Tibetan language at present. The experimental results were compared with traditional speech synthesis methods, with the audio quality significantly better than that of the traditional GMM-HMM in both naturalness and rhythm. It provides a crucial reference for the later research methods of Tibetan language and promotes the development of Tibetan language research.