论文信息 - Memory-based text correction for preposition and determiner errors

Memory-based text correction for preposition and determiner errors

We describe the Valkuil.net team entry for the HOO 2012 Shared Task. Our systems consists of four memory-based classifiers that generate correction suggestions for middle positions in small text windows of two words to the left and to the right. Trained on the Google 1TB 5-gram corpus, the first two classifiers determine the presence of a determiner or a preposition between all words in a text in which the actual determiners and prepositions are masked. The second pair of classifiers determines which is the most likely correction given a masked determiner or preposition. The hyperparameters that govern the classifiers are optimized on the shared task training data. We point out a number of obvious improvements to boost the medium-level scores attained by the system.

Antal van den Bosch | Peter Berck | P. Berck

[1] Antal van den Bosch. All-word Prediction as the Ultimate Confusible Disambiguation , 2006, Workshop On Computationally Hard Problems And Joint Inference In Speech And Language Processing.

[2] Walter Daelemans,et al. TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[3] Walter Daelemans,et al. IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[4] Robert Dale,et al. HOO 2012: A Report on the Preposition and Determiner Error Correction Shared Task , 2012, BEA@NAACL-HLT.

[5] Herman Stehouwer,et al. Putting the t where it belongs : Solving a confusion problem in Dutch , 2008 .