An Evaluation Model for Systems and Resources Employed in the Correction of Errors in Textual Documents

The wide adoption of Web 2.0 services has resulted in an increase in the amount of information produced. The quantity of errors contained in such information has grown even faster. Indeed, in traditional information production process documents were produced by professionals while in the Web context the content is generated by the users themselves. It is therefore necessary to take into account the errors particularly when such systems need to manage information of variable quality. Our state of the art leads us to identify difficulties in the comparative evaluation of error correction systems. Our proposal consists in an evaluation model for error correction systems and low-level string similarity (and distance) metrics they rely on. This model is implemented in an extensible platform providing a framework to evaluate those systems.

[1]  C. Fellbaum An Electronic Lexical Database , 1998 .

[2]  David A. Hull Xerox TREC-8 Question Answering Track Report , 1999, TREC.

[3]  Robert L. Mercer,et al.  Context based spelling correction , 1991, Inf. Process. Manag..

[4]  Graeme Hirst,et al.  Correcting real-word spelling errors by restoring lexical cohesion , 2005, Natural Language Engineering.

[5]  Ellen M. Voorhees,et al.  The TREC-5 Confusion Track: Comparing Retrieval Methods for Scanned Text , 2000, Information Retrieval.

[6]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[7]  Shourya Roy,et al.  A survey of types of text noise and techniques to handle noisy text , 2009, AND '09.

[8]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[9]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[10]  Patrick Ruch Using Contextual Spelling Correction to Improve Retrieval Effectiveness in Degraded Text Collections , 2002, COLING.

[11]  Lauren Figueredo,et al.  Spelling and the Web , 2009 .

[12]  Graeme Hirst,et al.  Real-Word Spelling Correction with Trigrams: A Reconsideration of the Mays, Damerau, and Mercer Model , 2008, CICLing.

[13]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[14]  Mohammed Bennamoun,et al.  Integrated Scoring For Spelling Error Correction, Abbreviation Expansion and Case Restoration in Dirty Text , 2006, AusDM.

[15]  Roger Mitton Ordering the suggestions of a spellchecker without using context , 2009, Nat. Lang. Eng..

[16]  Graeme Hirst,et al.  Lexical chains as representations of context for the detection and correction of malapropisms , 1995 .