论文信息 - FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm

FASPell: A Fast, Adaptable, Simple, Powerful Chinese Spell Checker Based On DAE-Decoder Paradigm

We propose a Chinese spell checker – FASPell based on a new paradigm which consists of a denoising autoencoder (DAE) and a decoder. In comparison with previous stateof-the-art models, the new paradigm allows our spell checker to be Faster in computation, readily Adaptable to both simplified and traditional Chinese texts produced by either humans or machines, and to require much Simpler structure to be as much Powerful in both error detection and correction. These four achievements are made possible because the new paradigm circumvents two bottlenecks. First, the DAE curtails the amount of Chinese spell checking data needed for supervised learning (to <10k sentences) by leveraging the power of unsupervisedly pre-trained masked language model as in BERT, XLNet, MASS etc. Second, the decoder helps to eliminate the use of confusion set that is deficient in flexibility and sufficiency of utilizing the salient feature of Chinese character similarity.

Yuzhong Hong | Yuzhong Hong

[1] Hsin-Hsi Chen,et al. Introduction to SIGHAN 2015 Bake-off for Chinese Spelling Check , 2015, SIGHAN@IJCNLP.

[2] Jui-Feng Yeh,et al. Chinese Word Spelling Correction Based on N-gram Ranked Inverted Index List , 2013, SIGHAN@IJCNLP.

[3] Jing Li,et al. A Hybrid Approach to Automatic Corpus Generation for Chinese Spelling Check , 2018, EMNLP.

[4] Xu Tan,et al. MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[5] Deng Cai,et al. A Hybrid Model for Chinese Spelling Check , 2017, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[6] Yoshua Bengio,et al. Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[7] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Hai Zhao,et al. Spell Checking for Chinese , 2012, LREC.

[9] Nikolaus Augsten,et al. Tree edit distance: Robust and memory-efficient , 2016, Inf. Syst..

[10] David Eppstein,et al. Finding the k Shortest Paths , 1999, SIAM J. Comput..

[11] Lung-Hao Lee,et al. Chinese Spelling Check Evaluation at SIGHAN Bake-off 2013 , 2013, SIGHAN@IJCNLP.