Chinese Paraphrase Dataset and Detection

Linguistic diversity is one of the core challenges faced by natural language processing technology, and paraphrase reflects the diversity of language. In recent years, Chinese paraphrase technology has received wide attention from academic and business researchers. There is a number of Chinese paraphrase datasets are published, and paraphrase detection evaluation tasks have been released. However, most of these Chinese paraphrase datasets have not been widely used. To resolve the above issue, we try to evaluate the value of current Chinese paraphrase datasets through paraphrase detection task, and whether the value of paraphrase is weakened in the era of pre-trained language models such as Bert. The experimental results show that the current Chinese paraphrase dataset has a significant enhancement effect on the paraphrase detection task. And the performances of paraphrase detection achieved by Bert can be further enhanced with paraphrase dataset. The result proves the value of Chinese paraphrase dataset on paraphrase detection task.