Adaptive OCR with limited user feedback

A methodology is proposed for processing noisy printed documents with limited user feedback. Without the support of ground truth, a specific collection of scanned documents can be processed to extract character templates. The adaptiveness of this approach lies in that the extracted templates are used to train an OCR classifier quickly and with limited user feedback. Experimental results show that this approach is extremely useful for the processing of noisy documents with many touching characters.

[1]  Xiaohu Zhang,et al.  Training on severely degraded text-line images , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[2]  Robert M. Haralick,et al.  A methodology for special symbol recognitions , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[3]  Henry S. Baird,et al.  Decoder banks: versatility, automation, and high accuracy without supervised training , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[4]  Tin Kam Ho,et al.  OCR with no shape training , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[5]  Thomas G. Szymanski,et al.  A fast algorithm for computing longest common subsequences , 1977, CACM.