Dense and Tight Detection of Chinese Characters in Historical Documents: Datasets and a Recognition Guided Detector

Characters in historical documents are typically densely distributed and are difficult to localize and segment by directly applying classic proposal and regression based methods. In this paper, we propose a novel method called recognition guided detector (RGD) that achieves tight Chinese character detection in historical documents. The proposed RGD consists of two simultaneously trained convolutional neural networks: a recognition guided proposal network that provides context information of the text and a detection network that applies this information to localize each of the characters accurately. To train and test the proposed method, we established two new datasets with character-level annotations, comprising ground truth character bounding boxes and ground truth characters in each of the boxes. The data in our datasets are scanned images collected from nine different versions of Tripitaka in Han. Experimental results show that, guided by a text recognition network with a test accuracy of 97.25%, the detection network in our proposed method achieves a much higher F-score with fewer parameters under a highly constrained evaluation criterion of intersection of union (IoU) ≥ 0.7, when comparing to several state-of-the-art object detection and text detection methods. The datasets are publicly available at https://github.com/HCIILAB/TKH_MTH_Datasets_Release for non-commercial use.

[1]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Qian Xu,et al.  CNN Based Transfer Learning for Historical Chinese Character Recognition , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[5]  Wenyu Liu,et al.  Multi-oriented Text Detection with Fully Convolutional Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lianwen Jin,et al.  Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Xiang Bai,et al.  Scene text detection and recognition: recent advances and future trends , 2015, Frontiers of Computer Science.

[8]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[9]  Volkmar Frinken,et al.  Handwriting recognition in historical documents using very large vocabularies , 2013, HIP '13.

[10]  Alex Graves,et al.  Long Short-Term Memory , 2020, Computer Vision.

[11]  Yi-Chao Wu,et al.  Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models , 2017, Pattern Recognit..

[12]  Xiang Bai,et al.  Detecting Oriented Text in Natural Images by Linking Segments , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Jun Sun,et al.  A novel text structure feature extractor for Chinese scene text detection and recognition , 2017, 2016 23rd International Conference on Pattern Recognition (ICPR).

[14]  Rung Ching Chen,et al.  Segmenting handwritten Chinese characters based on heuristic merging of stroke bounding boxes and dynamic programming , 1998, Pattern Recognit. Lett..

[15]  Kozaburo Hachimura,et al.  Character segmentation and retrieval for learning support system of Japanese historical books , 2013, HIP '13.

[16]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[17]  Liangrui Peng,et al.  Historical Chinese Character Recognition Method Based on Style Transfer Mapping , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[18]  Wee Hong Ong,et al.  Object detection via convolutional neural network , 2018 .

[19]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[21]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Aidong Men,et al.  G-CNN: Object Detection via Grid Convolutional Neural Network , 2017, IEEE Access.

[23]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[27]  George D. C. Cavalcanti,et al.  Text Line Segmentation Based on Morphology and Histogram Projection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[31]  Lianwen Jin,et al.  Feature Enhancement Network: A Refined Scene Text Detector , 2017, AAAI.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Yi-Chao Wu,et al.  Scene Text Recognition with Sliding Convolutional Character Models , 2017, ArXiv.

[34]  Masaki Nakagawa,et al.  Development of Nom character segmentation for collecting patterns from historical document pages , 2011, HIP '11.