Heuristic image decoding using separable source models

This paper describes an approach to reducing the computational cost of document image decoding using Markov source models. The kernel of the approach is a type of informed best-first search algorithm, called the iterated complete path (ICP) algorithm. ICP reduces computation by performing full Viterbi decoding only in those regions of the decoding trellis likely to contain the best path. These regions are identified by upper bounding the full decoding score using simple heuristic functions. Three types of heuristics have been explored, based on horizontal pixel projection, adjacent row scores, and decoding a reduced resolution image. Speedup factors of 3-25 have been obtained using these heuristics to decode text pages and telephone yellow page columns, leading to decoding times of about 1 minute per text page and 3 minutes per yellow page column on a four processor machine.<<ETX>>

[1]  Roberto Pieraccini,et al.  Connected and degraded text recognition using planar hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Philip A. Chou,et al.  Document Image Decoding Using Markov Source Models , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Dan S. Bloomberg,et al.  Word spotting in scanned images using hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.