Developing an open‐source, rule‐based proofreading tool

In this paper, we show how an open‐source, language‐independent proofreading tool has been built. Many languages lack contextual proofreading tools; for many others, only partial solutions are available. Using existing, largely language‐independent tools and collaborative processes it is possible to develop a practical style and grammar checker and to fight the digital divide in countries where commercial linguistic application software is unavailable or too expensive for average users. The described solution depends on relatively easily available language resources and does not require a fully formalized grammar nor a deep parser, yet it can detect many frequent context‐dependent spelling mistakes, as well as grammatical, punctuation, usage, and stylistic errors. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  Dawid Weiss Stempelator: A Hybrid Stemmer for the Polish Language , 2005 .

[2]  Walery Pisarek Słownik języka niby-polskiego, czyli błędy językowe w prasie , 1978 .

[3]  Khaled F. Shaalan,et al.  Arabic GramCheck: a grammar checker for Arabic , 2005, Softw. Pract. Exp..

[4]  Eric Brill,et al.  Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[5]  Ellen Riloff,et al.  A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction , 1999, Natural Language Engineering.

[6]  Joaquim Moré,et al.  A grammar checker based on web searching , 2006 .

[7]  Jan Daciuk Finite State Tools for Natural Language Processing , 2000, COLING 2000.

[8]  Ola Knutsson,et al.  Automatic Evaluation of Robustness and Degradation in Tagging and Parsing , 2003 .

[9]  Maciej Piasecki,et al.  A Wordnet from the ground up , 2009 .

[10]  Adam Przepiórkowski,et al.  Powierzchniowe przetwarzanie języka polskiego , 2008 .

[11]  András Kornai,et al.  Hunmorph: Open Source Word Analysis , 2005, ACL 2005.

[12]  史尚明 Corpus Linguistics, Computer Tools, and Applications - State of the Art , 2008 .

[13]  Marcin Milkowski,et al.  Automating rule generation for grammar checkers , 2012, ArXiv.

[14]  Eiríkur Rögnvaldsson,et al.  Context-Sensitive Spelling Correction and Rich Morphology , 2009, NODALIDA.

[15]  Marcin Miłkowski,et al.  UNCORRECTED DRAFT . For the final version , see Automated Building of Error Corpora of Polish , in , 2009 .