A new component based algorithm for newspaper layout analysis

The aim of the layout analysis is to extract the geometric structure from a document image. It is a progress of labeling homogenous regions of a document image. In order to present a complex newspaper layout analysis, this paper proposes a new component based bottom-up algorithm. With a novel homogeneity related definition of distance, it maintains a dynamic minimal distance mechanism to decide the components merging sequence. Under the restricting rules generated from the newspaper layout heuristically, we derive the preferred analysis result. Experimental results reveal the proposed approach is effective.

[1]  Hsi-Jian Lee,et al.  Recognition-based handwritten Chinese character segmentation using a probabilistic Viterbi algorithm , 1999, Pattern Recognit. Lett..

[2]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Haruo Asada,et al.  Major components of a complete text reading system , 1992 .

[4]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Seong-Whan Lee,et al.  Parameter-independent geometric document layout analysis , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[6]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Ching Y. Suen,et al.  Chinese document layout analysis based on adaptive split-and-merge and qualitative spatial reasoning , 1997, Pattern Recognit..

[8]  Yuan Yan Tang,et al.  Automatic document processing: A survey , 1996, Pattern Recognit..

[9]  A. Peter Johnson,et al.  A Fast Algorithm for Bottom-Up Document Layout Analysis , 1997, IEEE Trans. Pattern Anal. Mach. Intell..