Efficiently extracting and classifying objects for analyzing color documents

Conventional objects extraction method are not efficient for color document image with large graphics. For example, the projection profile and connected component based methods scanning the large graphics many times. To display the large graphics are extracted, conventional methods use rectangle to represent it. Thus, scanning into the large graphics is time-consuming. In this paper, a novel system for efficiently analyzing color documents is proposed to solve above mentioned problem. The proposed system includes color transformation, background color determination, objects extraction by top-down method, and objects classification without parameters. The proposed color document analysis system is efficient because it scans only background pixels such that the temporal complexity is O (NB), where NB is the total number of background color pixels. Results of this study demonstrate that this system is more effective and efficient than other methods. Moreover, the proposed algorithm can be run in an embedded environment (such as a mobile device) and processed in real-time system due to its simplicity and efficiency.

[1]  Hsi-Jian Lee,et al.  Document image binarization by two-stage block extraction and background intensity determination , 2007, Pattern Analysis and Applications.

[2]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[3]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Shu-Yuan Chen,et al.  Adaptive page segmentation for color technical journals' cover images , 1998, Image Vis. Comput..

[5]  Charalambos Strouthopoulos,et al.  Text extraction in complex color documents , 2002, Pattern Recognit..

[6]  Sung-Bae Cho,et al.  Geometric Structure Analysis of Document Images: A Knowledge-Based Approach , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Song Mao,et al.  Software architecture of PSET: a page segmentation evaluation toolkit , 2002, International Journal on Document Analysis and Recognition.

[8]  Friedrich M. Wahl,et al.  Document Analysis System , 1982, IBM J. Res. Dev..

[9]  Hsi-Jian Lee,et al.  Binarization of color document images via luminance and saturation color features , 2002, IEEE Trans. Image Process..

[10]  Thomas M. Breuel,et al.  Performance Comparison of Six Algorithms for Page Segmentation , 2006, Document Analysis Systems.

[11]  Horst Bunke,et al.  Text extraction from colored book and journal covers , 2000, International Journal on Document Analysis and Recognition.

[12]  Anil K. Jain,et al.  Document Representation and Its Application to Page Decomposition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Ching Y. Suen,et al.  Character string extraction from color documents , 2001, Pattern Recognit..

[14]  Mahesh Viswanathan,et al.  A prototype document image analysis system for technical journals , 1992, Computer.

[15]  Sargur N. Srihari,et al.  Classification of newspaper image blocks using texture analysis , 1989, Comput. Vis. Graph. Image Process..

[16]  Thomas L. Floyd Electronic Devices: Conventional Current Version , 2008 .

[17]  Friedrich M. Wahl,et al.  Block segmentation and text extraction in mixed text/image documents , 1982, Comput. Graph. Image Process..

[18]  Motoi Iwata,et al.  Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[19]  Lawrence O'Gorman,et al.  The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Seong-Whan Lee,et al.  Parameter-Free Geometric Document Layout Analysis , 2001, IEEE Trans. Pattern Anal. Mach. Intell..