Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

In this paper, we introduce a fully convolutional network for the document layout analysis task. While stateof-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

[1]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Frédéric Kaplan,et al.  dhSegment: A Generic Deep-Learning Approach for Document Segmentation , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[3]  Basilios Gatos,et al.  cBAD: ICDAR2019 Competition on Baseline Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[4]  Christopher Kermorvant,et al.  Fully convolutional network with dilated convolutions for handwritten text line segmentation , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[5]  Christian Wolf,et al.  Paragraph text segmentation into lines with Recurrent Neural Networks , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[6]  Roger Labahn,et al.  READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[7]  Johannes Michael,et al.  A two-stage method for text line detection in historical documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[8]  A. Papandreou,et al.  ICDAR 2013 Competition on Writer Identification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[9]  C. Clausner,et al.  Historical Document Layout Analysis Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[10]  Basilios Gatos,et al.  cBAD: ICDAR2017 Competition on Baseline Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[11]  Ersin Yumer,et al.  Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Christopher Kermorvant,et al.  HORAE: an annotated dataset of books of hours , 2019, HIP '19.

[13]  Angelika Garz,et al.  DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[14]  Jihad El-Sana,et al.  Text Line Segmentation for Challenging Handwritten Document Images using Fully Convolutional Network , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[15]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18]  Michael Murdock,et al.  ICDAR 2015 competition on text line detection in historical documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[19]  Marcus Liwicki,et al.  Open Evaluation Tool for Layout Analysis of Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20]  Maroua Mehri,et al.  Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).