论文信息 - Error-tolerant Finite State Recognition

Error-tolerant Finite State Recognition

Error-tolerant recognition enables the recognition of strings that deviate slightly fro� any string in the regular set recognized by the underlying finite state recognizer. In the context of natural language processing, it has applications in error-tolerant morphological analysis, and spe�g correction. After a description of the concepts and algorithms involved, we give examples from these two applications: In morphological analysis, error-tolerant recognition allows misspelled input word forms to � corrected, and morphologically analyzed concurrently. The algorithm can be applied to the moiphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes (such as agglutination or productive compounding) and morphographemic phenomena involved. We present an .application to error tolerant analysis of agglutinative morphology of Turkish words. In spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. It can be applied to any language whose morphology is fully described by·a finite state transducer, or with a word list comprising all inflected forms with very large word lists of root· and inflected forms (some containing well over 200,000 forms), generating all candida� solutions within 10 to 45 milliseconds (with edit distance 1 ) on a SparcStation 10/41 . For spelling correction in Turkish, error-tolerant recognition operating with a (circular) recognizerofTurkish words (with about 29,000 states and 1 19,000 transitions) can generate all candidate words in less than 20 milliseconds (with edit distance 1 ). Spelling correction using a recognizer constructed from a large word German list that simulates compounding, also indicates that the approach is applicable in such cases.

Kemal Oflazer | Kemal Oflazer

[1] Kemal Oflazer,et al. Spelling Correction in Agglutinative Languages , 1994, ANLP.

[2] Xabier Arregi,et al. A Morphological Analysis Based Method for Spelling Correction , 1993, EACL.

[3] Gerald Gazdar,et al. Natural Language Processing in PROLOG: An Introduction to Computational Linguistics , 1989 .

[4] Kemal Oflazer,et al. Two-level Description of Turkish Morphology , 1993, EACL.

[5] M. W. Du,et al. A model and a fast algorithm for multiple errors spelling correction , 1992, Acta Informatica.

[6] Karen Kukich,et al. Techniques for automatically correcting words in text , 1992, CSUR.

[7] Jorge Hankamer,et al. Morphological parsing and the lexicon , 1989 .

[8] Atro Voutilainen,et al. Ambiguity resolution in a reductionistic parser , 1993, EACL.

[9] E. Myers,et al. Approximate matching of regular expressions. , 1989, Bulletin of mathematical biology.

[10] Kemal Oflazer,et al. Tagging and Morphological Disambiguation of Turkish Text , 1994, ANLP.

[11] George Anton Kiraz,et al. A Morphographemic Model for Error Correction in Nonconcatenative Strings , 1995, ACL.

[12] Fred J. Damerau,et al. A technique for computer detection and correction of spelling errors , 1964, CACM.

[13] Kemal Oflazer,et al. Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[14] Lauri Karttunen. Constructing Lexical Transducers , 1994, COLING.

[15] Lauri Karttunen,et al. Two-level rule compiler , 1992 .

[16] Jean Véronis. Morphosyntactic correction in natural language interfaces , 1988, COLING.