Monolingual Corpus Driven Vietnamese-Chinese Neural Machine Translation

Neural machine translation (NMT) usually requires a massive parallel corpus of high quality as training data, the lack of which limits the performance of the NMT model for some low-resource languages. This paper aims to investigate how to make full use of readily accessible monolingual corpus to drive NMT model training. Firstly, the method of capturing a massive Vietnamese corpus is presented. Then, with the back-translation technique, the monolingual corpus is converted into the available Vietnamese-Chinese bilingual corpus which is used for NMT model training. Finally, a visual web application of Vietnamese-Chinese NMT is implemented to display the final training effect.