Segmenting text images with massively parallel machines

Image segmentation, the partitioning of an image into meaningful parts, is a major concern of any computer vision system. The meaningful parts of a text image are lines of text, words, and characters. In this paper, the segmentation of pages of text into lines of text and lines of text into characters on a parallel machine are examined. Using a parallel machine for text image segmentation allows the use of techniques that are impractical on a serial machine due to the computation time needed. It is possible to use a parallel machine to segment text images of lines using spatial histograms with an accuracy of 97.9% at a speed of 30 milliseconds or less per character. Statistically adaptive rules based on dynamic adaptive sampling are used for line segmentation and also for improved accuracy of character segmentation. The segmentation of lines from a page can also be accomplished using a set of statistically adaptive rules which allow sloped lines of text to be segmented. The use of these statistical rules on a parallel machine increases processing time by no more than 1 millisecond per character. Using statistical rules in combination with knowledge about the printed style increases the segmentation accuracy to 99.2% correct for machine-printed text and 89.6% for hand-printed text.

[1]  M. Yamada,et al.  Document image processing based on enhanced border following algorithm , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[2]  Michael D. Garris,et al.  Analysis of a biologically motivated neural network for character recognition , 1991, ANNA '91.

[3]  Michael D. Garris,et al.  Self-organizing neural network character recognition on a massively parallel computer , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[4]  Anil K. Jain,et al.  Segmentation of Document Images , 1989, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Kunihiko Fukushima,et al.  Character recognition with selective attention , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[6]  S.C. Hinds,et al.  A rule-based system for document image segmentation , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.