An open speech resource for Tibetan multi-dialect and multitask recognition

This paper introduces a Tibetan multi-dialect data resource for multitask speech research. It can be used for Tibetan multi-dialect speech recognition, Tibetan speaker recognition, Tibetan dialect identification, and Tibetan speech synthesis. The resource consists of 30 hours Lhasa-U-Tsang dialect; 8.7 hours Kham dialect, including 3.4 hours Yushu dialect, 3.3 hours Dege dialect and 2 hours Changdu dialect; 10 hours Amdo pastoral dialect. Other resources are also provided for Lhasa-U-Tsang dialect including phoneme set, pronunciation dictionary and the codes for constructing the Lhasa-U-Tsang speech recognition baseline system. Meanwhile, for Tibetan multi-dialect and multitask speech recognition, the codes and recognition results based on WaveNet-connectionist temporal classification (WaveNet-CTC) are provided. All the resources are free for researchers and publicly available, which effectively compensates for the shortage of public Tibetan multi-dialect speech resources in order to promote the development of Tibetan multi-dialect speech processing technology.