Word extraction using irregular pyramid

This paper proposed a new algorithm to perform text extraction from imaged documents. The paper focused in the extraction of word group. Irregular pyramid structure is used as the basis of the algorithm. The uniqueness of this algorithm is its inclusion of strategic background information in the analysis where most techniques have discarded. Both foreground (i.e. text area) and portion of background (i.e. white area) regions are examined. The fundamental of the algorithm is based on the concept of 'closeness' where text information within a group is closed to each other, in terms of spatial distance, as compared to other text area. The result produced by the algorithm is encouraging with the ability to correctly group words of different size, font, arrangement and orientation.

[1]  Azriel Rosenfeld,et al.  Hierarchical Image Analysis Using Irregular Tessellations , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Toyohide Watanabe,et al.  Character extraction from noisy background for an automatic reference system , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[3]  Chew Lim Tan,et al.  Text extraction using pyramid , 1998, Pattern Recognit..

[4]  George Nagy,et al.  HIERARCHICAL REPRESENTATION OF OPTICALLY SCANNED DOCUMENTS , 1984 .

[5]  Horace Ho-Shing Ip,et al.  Alternative strategies for irregular pyramid construction , 1996, Image Vis. Comput..

[6]  Peter Meer,et al.  Stochastic image pyramids , 1989, Comput. Vis. Graph. Image Process..

[7]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..