论文信息 - Text/graphic separation using a sparse representation with multi-learned dictionaries

Text/graphic separation using a sparse representation with multi-learned dictionaries

In this paper, we propose a new approach to extract text regions from graphical documents. In our method, we first empirically construct two sequences of learned dictionaries for the text and graphical parts respectively. Then, we compute the sparse representations of all different sizes and non-overlapped document patches in these learned dictionaries. Based on these representations, each patch can be classified into the text or graphic category by comparing its reconstruction errors. Same-sized patches in one category are then merged together to define the corresponding text or graphic layers which are combined to create a final text/graphic layer. Finally, in a post-processing step, text regions are further filtered out by using some learned thresholds.

Salvatore Tabbone | Oriol Ramos Terrades | Thanh-Ha Do

[1] Zhaoyang Lu,et al. Detection of Text Regions From Digital Engineering Drawings , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Michael Elad,et al. Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[3] Rangachar Kasturi,et al. Improved Directional Morphological Operations for Separation of Characters from Maps/Graphics , 1997, GREC.

[4] Salvatore Tabbone,et al. Text extraction from graphical document images using sparse representation , 2010, DAS '10.

[5] Ching Y. Suen,et al. Text detection from scene images using sparse representation , 2008, 2008 19th International Conference on Pattern Recognition.

[6] Bart Lamiroy,et al. Text/Graphics Separation Revisited , 2002, Document Analysis Systems.

[7] Rangachar Kasturi,et al. A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Dov Dori,et al. Vector-Based Segmentation of Text Connected to Graphics in Engineering Drawings , 1996, SSPR.