ERROR-CORRECTING PARSING FOR SYNTACTIC PATTERN RECOGNITION.

Abstract : The problem of modeling, analysis and reconstruction of noisy and/or distorted syntactic patterns is studied. Segmentation errors and primitive extraction errors can be treated as syntac errors and defined in terms of language transformation rules. Three types of error transformations are defined on strings, namely substitution, insertion and deletion. Consequently, the parser constructed according to the grammar generating the strings and the three types of transformations is called the error-correcting parser. This technique is also extended to tree languages. In formulating error-correcting tree automata (ECTA), five types of error-transformations on trees are defined, namely, substitution, split, stretch, branch and deletion. By way of using language transformations, the distance between two sentences can be determined. A definition of distance between a sentence and a language is proposed. Based on this definition, a clustering procedure is proposed, where error-correcting parsers are employed to determine the distance between an input syntactic pattern and a formed cluster, or a language.