Error-tolerant Finite State Recognition

Error-tolerant recognition enables the recognition of strings that deviate slightly fro� any string in the regular set recognized by the underlying finite state recognizer. In the context of natural language processing, it has applications in error-tolerant morphological analysis, and spe�g correction. After a description of the concepts and algorithms involved, we give examples from these two applications: In morphological analysis, error-tolerant recognition allows misspelled input word forms to � corrected, and morphologically analyzed concurrently. The algorithm can be applied to the moiphological analysis of any language whose morphology is fully captured by a single (and possibly very large) finite state transducer, regardless of the word formation processes (such as agglutination or productive compounding) and morphographemic phenomena involved. We present an .application to error­ tolerant analysis of agglutinative morphology of Turkish words. In spelling correction, error-tolerant recognition can be used to enumerate correct candidate forms from a given misspelled string within a certain edit distance. It can be applied to any language whose morphology is fully described by·a finite state transducer, or with a word list comprising all inflected forms with very large word lists of root· and inflected forms (some containing well over 200,000 forms), generating all candida� solutions within 10 to 45 milliseconds (with edit distance 1 ) on a SparcStation 10/41 . For spelling correction in Turkish, error-tolerant recognition operating with a (circular) recognizerofTurkish words (with about 29,000 states and 1 19,000 transitions) can generate all candidate words in less than 20 milliseconds (with edit distance 1 ). Spelling correction using a recognizer constructed from a large word German list that simulates compounding, also indicates that the approach is applicable in such cases.

[1]  Kemal Oflazer,et al.  Spelling Correction in Agglutinative Languages , 1994, ANLP.

[2]  Xabier Arregi,et al.  A Morphological Analysis Based Method for Spelling Correction , 1993, EACL.

[3]  Gerald Gazdar,et al.  Natural Language Processing in PROLOG: An Introduction to Computational Linguistics , 1989 .

[4]  Kemal Oflazer,et al.  Two-level Description of Turkish Morphology , 1993, EACL.

[5]  M. W. Du,et al.  A model and a fast algorithm for multiple errors spelling correction , 1992, Acta Informatica.

[6]  Karen Kukich,et al.  Techniques for automatically correcting words in text , 1992, CSUR.

[7]  Jorge Hankamer,et al.  Morphological parsing and the lexicon , 1989 .

[8]  Atro Voutilainen,et al.  Ambiguity resolution in a reductionistic parser , 1993, EACL.

[9]  E. Myers,et al.  Approximate matching of regular expressions. , 1989, Bulletin of mathematical biology.

[10]  Kemal Oflazer,et al.  Tagging and Morphological Disambiguation of Turkish Text , 1994, ANLP.

[11]  George Anton Kiraz,et al.  A Morphographemic Model for Error Correction in Nonconcatenative Strings , 1995, ACL.

[12]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[13]  Kemal Oflazer,et al.  Error-tolerant Finite-state Recognition with Applications to Morphological Analysis and Spelling Correction , 1995, CL.

[14]  Lauri Karttunen Constructing Lexical Transducers , 1994, COLING.

[15]  Lauri Karttunen,et al.  Two-level rule compiler , 1992 .

[16]  Jean Véronis Morphosyntactic correction in natural language interfaces , 1988, COLING.