论文信息 - Visual domain knowledge-based multimodal zoning for textual region localization in noisy historical document images

Visual domain knowledge-based multimodal zoning for textual region localization in noisy historical document images

Abstract. Document layout analysis, or zoning, is important for textual content analysis such as optical character recognition. Zoning document images such as digitized historical newspaper pages are challenging due to noise and quality of the document images. Recently, effective data-driven approaches, such as leveraging deep learning, have been proposed, albeit with the concern of requiring larger training data and thus incurring additional cost of ground truthing. We propose a zoning solution by incorporating a knowledge-driven document representation, gravity map, into a multimodal deep learning framework to reduce the amount of time and data required for training. We first generate a gravity map for each image, considering the centroid distance and area between a cell in a Voronoi tessellation and its content to encode visual domain knowledge of a zoning task. Second, we inject the gravity maps into a deep convolution neural network (DCNN) during training, as an additional modality to boost performance. We report on two investigations using two state-of-the-art DCNN architectures and three datasets: two sets of historical newspapers and a set of born-digital contemporary documents. Evaluations show that our solution achieved comparable segmentation accuracy using fewer training epochs and less training data compared to a naïve training scheme.

[1] Apostolos Antonacopoulos,et al. A Realistic Dataset for Performance Evaluation of Document Layout Analysis , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[2] Lawrence O'Gorman,et al. The Document Spectrum for Page Layout Analysis , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[3] Bernhard Liebl,et al. An Evaluation of DNN Architectures for Page Segmentation of Historical Newspapers , 2020, ArXiv.

[4] Gaurav Kumar. Analytical Review of Preprocessing Techniques for Offline Handwritten Character Recognition , 2013 .

[5] R. Manmatha,et al. Holistic word recognition for handwritten historical documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[6] Stavros J. Perantonis,et al. Automatic page analysis for the creation of a digital library from newspaper archives , 2000, International Journal on Digital Libraries.

[7] Marcus Liwicki,et al. Language Model Integration for the Recognition of Handwritten Medieval Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[8] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9] Peng-Yeng Yin. Skew detection and block classification of printed documents , 2001, Image Vis. Comput..

[10] Robert M. Haralick,et al. Recursive X-Y cut using bounding boxes of connected components , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.

[11] Mausoom Sarkar,et al. Document Structure Extraction for Forms using Very High Resolution Semantic Segmentation , 2019, ArXiv.

[12] Nikolaos Ntogas,et al. A binarization algorithm for historical manuscripts , 2008, ICC 2008.

[13] Frank Puppe,et al. Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images , 2017, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[14] Apostolos Antonacopoulos,et al. ICDAR2017 Competition on Recognition of Documents with Complex Layouts - RDCL2017 , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[15] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[16] A. Peter Johnson,et al. A Fast Algorithm for Bottom-Up Document Layout Analysis , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[17] Apostolos Antonacopoulos,et al. ICDAR 2013 Competition on Historical Newspaper Layout Analysis (HNLA 2013) , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[18] Institute of Museum and Library Services , 2021, The Grants Register 2022.

[19] Raymond W. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[20] Jihad El-Sana,et al. Language-Independent Text Lines Extraction Using Seam Carving , 2011, 2011 International Conference on Document Analysis and Recognition.

[21] Rohit Prasad,et al. Robust Page Segmentation Based on Smearing and Error Correction Unifying Top-down and Bottom-up Approaches , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[22] Bertrand Le Saux,et al. Beyond RGB: Very High Resolution Urban Remote Sensing With Multimodal Deep Networks , 2017, ISPRS Journal of Photogrammetry and Remote Sensing.

[23] Jorge Calvo-Zaragoza,et al. A selectional auto-encoder approach for document image binarization , 2017, Pattern Recognit..

[24] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25] Marco Gori,et al. Integrating Prior Knowledge into Deep Learning , 2017, 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA).

[26] Joonho Lee,et al. Page Segmentation using a Convolutional Neural Network with Trainable Co-Occurrence Features , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[27] Nikos A. Nikolaou,et al. Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths , 2010, Image Vis. Comput..

[28] Cheng-Lin Liu,et al. A Hybrid Approach to Detect and Localize Texts in Natural Scene Images , 2011, IEEE Transactions on Image Processing.

[29] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[30] Frédéric Jurie,et al. CentralNet: a Multilayer Approach for Multimodal Fusion , 2018, ECCV Workshops.

[31] Frédéric Kaplan,et al. dhSegment: A Generic Deep-Learning Approach for Document Segmentation , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[32] Keiichi Abe,et al. Topological structural analysis of digitized binary images by border following , 1985, Comput. Vis. Graph. Image Process..

[33] Alicia Fornés,et al. Transcription alignment of Latin manuscripts using hidden Markov models , 2011, HIP '11.

[34] David S. Doermann,et al. Context-aware and content-based dynamic Voronoi page segmentation , 2010, DAS '10.

[35] Yue Xu,et al. Page Segmentation for Historical Handwritten Documents Using Fully Convolutional Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[36] Hatice Gunes,et al. Affect recognition from face and body: early fusion vs. late fusion , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[37] George Nagy,et al. DOCUMENT ANALYSIS WITH AN EXPERT SYSTEM , 1986 .

[38] Simon M. Lucas,et al. Top-Down Likelihood Word Image Generation Model for Holistic Word Recognition , 2002, Document Analysis Systems.

[39] Karim Hadjar,et al. Newspaper page decomposition using a split and merge approach , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[40] David S. Doermann,et al. Voronoi++: A Dynamic Page Segmentation Approach Based on Voronoi and Docstrum Features , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[41] David S. Doermann,et al. Machine printed text and handwriting identification in noisy document images , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] David Doermann,et al. Handbook of Document Image Processing and Recognition , 2014, Springer London.

[43] Angelika Garz,et al. DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[44] Apostolos Antonacopoulos,et al. The ENP image and ground truth dataset of historical newspapers , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[45] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47] Kai Chen,et al. Convolutional Neural Networks for Page Segmentation of Historical Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[48] Abby Goodrum,et al. Image Information Retrieval: An Overview of Current Research , 2000, Informing Sci. Int. J. an Emerg. Transdiscipl..

[49] Yoshua Bengio,et al. Inference for the Generalization Error , 1999, Machine Learning.

[50] Muriel Visani,et al. DocCreator: A New Software for Creating Synthetic Ground-Truthed Document Images , 2017, J. Imaging.

[51] Motoi Iwata,et al. Segmentation of Page Images Using the Area Voronoi Diagram , 1998, Comput. Vis. Image Underst..

[52] C. Clausner,et al. ICDAR2015 competition on recognition of documents with complex layouts - RDCL2015 , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[53] Axel Pinz,et al. Layout and analysis: Finding text, titles, and photos in digital images of newspaper pages , 1993, Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR '93).

[54] Soo-Hyung Kim,et al. A robust system for document layout analysis using multilevel homogeneity structure , 2017, Expert Syst. Appl..

[55] Ersin Yumer,et al. Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).