Document Image Decoding by Heuristic Search

This correspondence describes an approach to reducing the computational cost of document image decoding by viewing it as a heuristic search problem. The kernel of the approach is a modified dynamic programming (DP) algorithm, called the iterated complete path (ICP) algorithm, that is intended for use with separable source models. A set of heuristic functions are presented for decoding formatted text with ICP. Speedups of 3-25 over DP have been observed when decoding text columns and telephone yellow pages using ICP and the proposed heuristics.

[1]  Oscar E. Agazzi,et al.  Keyword Spotting in Poorly Printed Documents using Pseudo 2-D Hidden Markov Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Francine Chen,et al.  Spotting phrases in lines of imaged text , 1995, Electronic Imaging.

[3]  A. C. Kam,et al.  The iterated complete path algorithm , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[4]  Philip A. Chou,et al.  Document Image Decoding Using Markov Source Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Gary E. Kopec,et al.  Separable source models for document image decoding , 1995, Electronic Imaging.

[6]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .