Context-Based Spelling Correction for Japanese OCR

We present a novel spelling correction method for those languages that have no delimiter between words, such as Japanese, Chinese, and Thai. It consists of an approximate word matching method and an N-best word segmentation algorithm using a statistical language model. For OCR errors, the proposed word-based correction method outperforms the conventional character-based correction method. When the baseline character recognition accuracy is 90%, it achieves 96.0% character recognition accuracy and 96.3% word segmentation accuracy, while the character recognition accuracy of character-based correction is 93.3%.