Context Processing to Read Text on Damaged Wooden Tablets

This paper describes context processing to present candidates for damaged scripts on wooden tablets (mokkans). Since mokkans excavated from old strata have been damaged, even archeologists can hardly read scripts on mokkans. Very often, ink in several areas are faded out or completely lost, some characters might be misrecognized based on which other characters must be read. The context processing extends the Aho-Corasick method to allow self-transition and presents candidates even for scripts with lost ink and misrecognized characters. For evaluation, we employed 4,041 place names in Japan at the 8th century as the vocabulary. Each place name consists of 9 to 11 characters. Test keywords were prepared with 1 to 6 characters lost and 0 to 2 characters replaced by others from the vocabulary. Even for those with 5 characters lost and one character is replaced, the method nominates correct names in the top 10 candidates with 71.7% correctness.

[1]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[2]  Ioannis Pratikakis,et al.  An Adaptive Binarization Technique for Low Quality Historical Documents , 2004, Document Analysis Systems.

[3]  Yan Chen,et al.  Decompose-threshold approach to handwriting extraction in degraded historical document images , 2004, Ninth International Workshop on Frontiers in Handwriting Recognition.

[4]  Venu Govindaraju,et al.  Historical document image enhancement using background light intensity normalization , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Cheng-Lin Liu,et al.  Lexicon-Driven Segmentation and Recognition of Handwritten Character Strings for Japanese Address Reading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Jin Hyung Kim,et al.  Segmentation of Handwritten Characters for Digitalizing Korean Historical Documents , 2004, Document Analysis Systems.

[7]  Kei Saito,et al.  Support system for archeologists to read scripts on mokkans , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).