Data Augmentation Using CycleGAN for End-to-End Children ASR
暂无分享,去创建一个
Recent deep learning algorithms are known to perform better for Automatic Speech Recognition (ASR) of adult speakers, however, yet remains a challenge to recognize children's speech with the similar accuracy. Due to less availability of children's speech data to train the deep neural network, data augmentation is one of the key research areas for children ASR. In this paper, voice conversion-based data augmentation using CycleGAN is explored and performance comparison with and without data augmentation is presented. ASR experiments were performed using TLT school corpus. In our experiment, CycleGAN-based 200 hours of converted adult speech showed good performance improvement with the reduction of 5.58% WER compared to the baseline system. In addition, the combination of SpecAugment, speed perturbed, and CycleGAN converted adult speech showed the highest reduction of 7.44% WER compared to baseline system 1.