论文信息 - Language Models for Contextual Error Detection and Correction - 字舞流文

Language Models for Contextual Error Detection and Correction

The problem of identifying and correcting confusibles, i.e. context-sensitive spelling errors, in text is typically tackled using specifically trained machine learning classifiers. For each different set of confusibles, a specific classifier is trained and tuned. In this research, we investigate a more generic approach to context-sensitive confusible correction. Instead of using specific classifiers, we use one generic classifier based on a language model. This measures the likelihood of sentences with different possible solutions of a confusible in place. The advantage of this approach is that all confusible sets are handled by a single model. Preliminary results show that the performance of the generic classifier approach is only slightly worse that that of the specific classifier approach.

Herman Stehouwer | Menno van Zaanen | Menno van Zaanen | Menno van Zaanen | H. Stehouwer

[1] M. V. Wilkes,et al. The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[2] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[3] Beatrice Santorini,et al. Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[4] David Yarowsky,et al. DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[5] Andrew R. Golding,et al. A Bayesian Hybrid Method for Context-sensitive Spelling Correction , 1996, VLC@ACL.

[6] Stanley F. Chen,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[7] Walter Daelemans,et al. Memory-Based Learning: Using Similarity for Smoothing , 1997, ACL.

[8] Eric Brill,et al. Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[9] Walter Daelemans,et al. TiMBL: Tilburg Memory-Based Learner, version 2.0, Reference guide , 1998 .

[10] Donald E. Knuth,et al. The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[11] Zhifang Sui,et al. An information-based method for selecting feature types for word prediction , 1999, EUROSPEECH.

[12] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[13] Dan Roth,et al. A Classification Approach to Word Prediction , 2000, ANLP.

[14] Dominiek Sandra,et al. Zo helder en toch zoveel fouten! Wat leren we uit psycholinguÃ¯stisch onderzoek naar werkwoordfouten bij ervaren spellers? , 2001 .

[15] David M. W. Powers,et al. Large scale experiments on correction of confused words , 2001, Proceedings 24th Australian Computer Science Conference. ACSC 2001.

[16] Michele Banko,et al. Scaling to Very Very Large Corpora for Natural Language Disambiguation , 2001, ACL.

[17] David Thomas,et al. The Art in Computer Programming , 2001 .

[18] Thomas L. Griffiths,et al. Integrating Topics and Syntax , 2004, NIPS.

[19] Yiming Yang,et al. RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[20] Dan Roth,et al. A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[21] Walter Daelemans,et al. IGTree: Using Trees for Compression and Classification in Lazy Learning Algorithms , 1997, Artificial Intelligence Review.

[22] Colin de la Higuera,et al. A bibliographical study of grammatical inference , 2005, Pattern Recognit..

[23] Antal van den Bosch. Wrapped progressive sampling search for optimizing learning algorithm parameters , 2005 .

[24] Antal van den Bosch. Scalable classification-based word prediction and confusible correction , 2005 .

[25] Walter Daelemans,et al. Dat gebeurd mei niet: computationele modellen voor verwarbare homofonen , 2007 .