A method of post-processing for character recognition based on syntactic and semantic analysis of sentences

Post-processing of character recognition refers to the processing used to correct the errors in character recognition. When the input is a string representing a sentence in the highly precise error correction, it is desired that the syntactic as well as semantic examinations should be made at the sentence level. This paper assumes that the morphemes, syntax and semantics of the input sentence can be analyzed, and proposes a method that uses the syntactic and semantic analysis in the post-processing. The proposed method receives the list of candidate characters up to the fifth, and outputs the sentence that is adequate from the viewpoints of both syntax and semantics. The method features the following three points: (1) in word matching, it is examined also whether or not a sentence adequate from the viewpoints of syntax and semantics can be composed, and then the inadequate words extraction is inhibited; (2) characters having stronger syntactic and semantic constraints, such as the single-character particle and the conjugational suffix, are estimated top-down. Then, the case where the adequate character is not contained in the candidates can be handled; and (3) the words for which the adequateness cannot be determined from the syntactic or semantic viewpoint are selected by character re-recognition processing. An experiment is executed for 50 sample sentences. The character recognition rate is improved from 83.0 percent to 98.0 percent, and the sentence recognition rate is improved from 10.0 percent to 94.0 percent. Compared to the method based only on word matching, the sentence recognition rate is improved by more than 20 percent. In other words, the effectiveness of the proposed method is demonstrated.