A Character Detection Method for Ancient Yi Books Based on Connected Components and Regressive Character Segmentation

Character detection is an important issue for character recognition of ancient Yi books. The accuracy of detection directly affects the recognition effect of ancient Yi books. Considering the complex layout, the lack of standard typesetting and the mixed arrangement between images and texts, we propose a character detection method for ancient Yi books based on connected components and regressive character segmentation. First, the scanned images of ancient Yi books are preprocessed with nonlocal mean filtering, and then a modified local adaptive threshold binarization algorithm is used to obtain the binary images to segment the foreground and background for the images. Second, the non-text areas are removed by the method based on connected components. Finally, the single character in the ancient Yi books is segmented by our method. The experimental results show that the method can effectively separate the text areas and non-text areas for ancient Yi books and achieve higher accuracy and recall rate in the experiment of character detection, and effectively solve the problem of character detection and segmentation in character recognition of ancient books. Keywords—Computing methodologies, interest point, salient region detections, image segmentation.

[1]  S.M. Lucas,et al.  ICDAR 2005 text locating competition results , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[2]  Jin Jian-ming Printed Uyghur Texts Segmentation , 2005 .

[3]  Yue Xu,et al.  Multi-task Layout Analysis for Historical Handwritten Documents Using Fully Convolutional Networks , 2018, IJCAI.

[4]  B. S. Manjunath,et al.  Learning bottom-up text attention maps for text detection using stroke width transform , 2013, 2013 IEEE International Conference on Image Processing.

[5]  Guanglai Gao,et al.  A knowledge-based recognition system for historical Mongolian documents , 2016, International Journal on Document Analysis and Recognition (IJDAR).

[6]  Christopher Kermorvant,et al.  Fully convolutional network with dilated convolutions for handwritten text line segmentation , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[7]  Yong Zhang,et al.  Text string detection for loosely constructed characters with arbitrary orientations , 2015, Neurocomputing.

[8]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[9]  Jie Yuan,et al.  Handwritten Yi Character Recognition with Density-Based Clustering Algorithm and Convolutional Neural Network , 2017, CSE/EUC.

[10]  Liu Yongge,et al.  Text on Oracle rubbing segmentation method based on connected domain , 2016, 2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC).

[11]  Tongwei Lu,et al.  Detecting text in natural scenes with multi-level MSER and SWT , 2018, International Conference on Graphic and Image Processing.

[12]  Fei Yin,et al.  Printed/Handwritten Texts and Graphics Separation in Complex Documents Using Conditional Random Fields , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[13]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  Xiang Bai,et al.  Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.