Text Segmentation from Bangla Land Map Images

Abstract Text segmentation from land map images is a non-trivial task as map components are interleaved and overlapped in a complex spatial form. The characters in a word in most of the Indic languages, including Bangla (the 6th most spoken language in the world), are connected through a headline (”matra” or ”shirorekha”) which makes the corresponding word a single component. It has been observed that the Delaunay triangulation (DT) forms a number of small triangles on the text regions compared to other regions of the map - a property very much discernible for Bangla (and some other Indic scripts) texts. This property is primarily exploited here to segment text from the complex background of the land map images. The proposed text segmentation approach is tested and compared with an existing method on a collected dataset of paper map images( containing Bangla, an Indian regional language texts) and the results are encouraging.

[1]  Rong Huang,et al.  Scene Character Detection by an Edge-Ray Filter , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[2]  Aria Pezeshk,et al.  Improved Multi Angled Parallelism for separation of text from intersecting linear features in scanned topographic maps , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Luc Vincent,et al.  Morphological grayscale reconstruction in image analysis: applications and efficient algorithms , 1993, IEEE Trans. Image Process..

[4]  Bhabatosh Chanda,et al.  Extraction and recognition of geographical features from paper maps , 2006, International Journal of Document Analysis and Recognition (IJDAR).

[5]  Yonatan Wexler,et al.  Detecting text in natural scenes with stroke width transform , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Aria Pezeshk,et al.  Automatic Feature Extraction and Text Recognition From Scanned Topographic Maps , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[7]  Chew Lim Tan,et al.  Text/Graphics Separation in Maps , 2001, GREC.

[8]  Zu Kim,et al.  Realtime Obstacle Detection and Tracking Based on Constrained Delaunay Triangulation , 2006, 2006 IEEE Intelligent Transportation Systems Conference.

[9]  Salvatore Tabbone,et al.  Text extraction from graphical document images using sparse representation , 2010, DAS '10.

[10]  S. Imade,et al.  Segmentation and classification for mixed text/image documents using neural network , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[11]  Amit Kumar Das,et al.  Land Map Images Binarization Based on Distance Transform and Adaptive Threshold , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[12]  Umapada Pal,et al.  Text/Graphics Separation in Color Maps , 2007, 2007 International Conference on Computing: Theory and Applications (ICCTA'07).

[13]  Yue Lu,et al.  Bangla/English Script Identification Based on Analysis of Connected Component Profiles , 2006, Document Analysis Systems.

[14]  C. V. Jawahar,et al.  An MRF Model for Binarization of Natural Scene Text , 2011, 2011 International Conference on Document Analysis and Recognition.

[15]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[16]  Jean-Philippe Pons,et al.  Delaunay Deformable Models: Topology-Adaptive Meshes Based on the Restricted Delaunay Triangulation , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Rangachar Kasturi,et al.  Segmentation of text from color map images , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[18]  Mohammad Shorif Uddin,et al.  Extraction of ROI in Geographical Map Image , 2011 .

[19]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  K. Ramakrishnan,et al.  Localization of Handwritten Text in Documents Using Moment Invariants and Delaunay Triangulation , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[21]  Jean-Yves Ramel,et al.  Text/graphic labelling of ancient printed documents , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[22]  Bidyut Baran Chaudhuri,et al.  An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi) , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[23]  P. Recht,et al.  On a relation between the cycle packing number and the cyclomatic number of a graph , 2008 .

[24]  Ioannis Pratikakis,et al.  ICDAR 2009 Document Image Binarization Contest (DIBCO 2009) , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[25]  S. Sloan A fast algorithm for constructing Delaunay triangulations in the plane , 1987 .

[26]  Rui Zhang,et al.  Recognition of character strings from color urban map images on the basis of validation mechanism , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[27]  Luyang Li,et al.  Cooperative text and line-art extraction from a topographic map , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[28]  Marcus Liwicki,et al.  Extraction of Text Touching Graphics Using SURF , 2012, 2012 10th IAPR International Workshop on Document Analysis Systems.

[29]  Marcus Liwicki,et al.  Text/Graphics Segmentation in Architectural Floor Plans , 2011, 2011 International Conference on Document Analysis and Recognition.

[30]  Bart Lamiroy,et al.  Text/Graphics Separation Revisited , 2002, Document Analysis Systems.

[31]  Umapada Pal,et al.  A System to Segment Text and Symbols from Color Maps , 2007, GREC.

[32]  Chew Lim Tan,et al.  Text Localization in Web Images Using Probabilistic Candidate Selection Model , 2011, 2011 International Conference on Document Analysis and Recognition.