Page segmentation without rectangle assumption

A new technique for page segmentation without skew normalization is described and applied to both English and Japanese complex printed-page layouts. There is no need to make any assumption about the shape of blocks, hence the technique can handle not only skewed pages but it can also be extended to handle documents where columns are not rectangles. In this technique, based on the bottom-up strategy, the connected components are extracted on the reduced image and are classified with their local information. Since the skew angle is also estimated with the local information of blocks, the computational time is very short. Merging text blocks into string lines and into columns is performed with the skew information.<<ETX>>

[1]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[2]  S. Tsujimoto,et al.  Understanding multi-articled documents , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[3]  Henry S. Baird,et al.  Image segmentation by shape-directed covers , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.