Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT). The hybrid system achieves new state-of-the-art results on the CoNLL-2014 and JFLEG benchmarks. This GEC system preserves the accuracy of SMT output and, at the same time, generates more fluent sentences as it typical for NMT. Our analysis shows that the created systems are closer to reaching human-level performance than any other GEC system reported so far.

[1]  Franz Josef Och,et al.  Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[2]  Hwee Tou Ng,et al.  How Far are We from Fully Automatic High Quality Grammatical Error Correction? , 2015, ACL.

[3]  Helen Yannakoudakis,et al.  Neural Sequence-Labelling Models for Grammatical Error Correction , 2017, EMNLP.

[4]  H. Ng,et al.  A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction , 2018, AAAI.

[5]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[6]  Dan Roth,et al.  Grammatical Error Correction: Machine Translation and Classifiers , 2016, ACL.

[7]  Shamil Chollampatt,et al.  Exploiting N-Best Hypotheses to Improve an SMT Approach to Grammatical Error Correction , 2016, IJCAI.

[8]  Matt Post,et al.  Ground Truth for Grammatical Error Correction Metrics , 2015, ACL.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Stephan Vogel,et al.  Parallel Implementations of Word Alignment Tool , 2008, SETQALNLP.

[11]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[12]  Alexander M. Rush,et al.  Adapting Sequence Models for Sentence Correction , 2017, EMNLP.

[13]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[14]  Matt Post,et al.  Grammatical Error Correction with Neural Reinforcement Learning , 2017, IJCNLP.

[15]  Jianfeng Gao,et al.  A Nested Attention Neural Hybrid Model for Grammatical Error Correction , 2017, ACL.

[16]  Ted Briscoe,et al.  Grammatical error correction using neural machine translation , 2016, NAACL.

[17]  Marcin Junczys-Dowmunt,et al.  Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction , 2016, EMNLP.

[18]  Hwee Tou Ng,et al.  Building a Large Annotated Corpus of Learner English: The NUS Corpus of Learner English , 2013, BEA@NAACL-HLT.

[19]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[20]  Shamil Chollampatt,et al.  Connecting the Dots: Towards Human-Level Grammatical Error Correction , 2017, BEA@EMNLP.

[21]  Daniel Jurafsky,et al.  Neural Language Correction with Character-Based Attention , 2016, ArXiv.

[22]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[23]  Joel R. Tetreault,et al.  JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction , 2017, EACL.

[24]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[25]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[26]  Hwee Tou Ng,et al.  The CoNLL-2013 Shared Task on Grammatical Error Correction , 2013, CoNLL Shared Task.

[27]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[28]  Hwee Tou Ng,et al.  Better Evaluation for Grammatical Error Correction , 2012, NAACL.

[29]  Rico Sennrich,et al.  Edinburgh Neural Machine Translation Systems for WMT 16 , 2016, WMT.

[30]  Yuji Matsumoto,et al.  The Effect of Learner Corpus Size in Grammatical Error Correction of ESL Writings , 2012, COLING.