论文信息 - Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks

In this paper, we introduce a fully convolutional network for the document layout analysis task. While stateof-the-art methods are using models pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped model trained from scratch for detecting objects from historical documents. We consider the line segmentation task and more generally the layout analysis problem as a pixel-wise classification task then our model outputs a pixel-labeling of the input images. We show that Doc-UFCN outperforms state-of-the-art methods on various datasets and also demonstrate that the pre-trained parts on natural scene images are not required to reach good results. In addition, we show that pre-training on multiple document datasets can improve the performances. We evaluate the models using various metrics to have a fair and complete comparison between the methods.

Christopher Kermorvant | Thierry Paquet | M'elodie Boillet

[1] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2] Frédéric Kaplan,et al. dhSegment: A Generic Deep-Learning Approach for Document Segmentation , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[3] Basilios Gatos,et al. cBAD: ICDAR2019 Competition on Baseline Detection , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[4] Christopher Kermorvant,et al. Fully convolutional network with dilated convolutions for handwritten text line segmentation , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[5] Christian Wolf,et al. Paragraph text segmentation into lines with Recurrent Neural Networks , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[6] Roger Labahn,et al. READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[7] Johannes Michael,et al. A two-stage method for text line detection in historical documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[8] A. Papandreou,et al. ICDAR 2013 Competition on Writer Identification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[9] C. Clausner,et al. Historical Document Layout Analysis Competition , 2011, 2011 International Conference on Document Analysis and Recognition.

[10] Basilios Gatos,et al. cBAD: ICDAR2017 Competition on Baseline Detection , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[11] Ersin Yumer,et al. Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Christopher Kermorvant,et al. HORAE: an annotated dataset of books of hours , 2019, HIP '19.

[13] Angelika Garz,et al. DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[14] Jihad El-Sana,et al. Text Line Segmentation for Challenging Handwritten Document Images using Fully Convolutional Network , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[15] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[16] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[17] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[18] Michael Murdock,et al. ICDAR 2015 competition on text line detection in historical documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[19] Marcus Liwicki,et al. Open Evaluation Tool for Layout Analysis of Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[20] Maroua Mehri,et al. Text Line Segmentation in Historical Document Images Using an Adaptive U-Net Architecture , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).