Logical Layout Analysis using Deep Learning

Logical layout analysis plays an important part in document understanding. It can become a challenging task due to varying formats and layouts. Researchers have proposed different ways to solve this problem, mostly using visual information in some way and a complex pipeline. In this paper, we present a simple technique for labelling the logical structures in document images. We use visual and textual features from the document images to label zones. We utilize Recurrent Neural Networks, specifically 2 layers of LSTM, which input the text from the zone that we want to classify as sequences of words and the normalized position of each word with respect to the page width and height. Comparisons are made by comparing the image under test with the known layouts and labels are assigned to zones accordingly. The labels are abstract, title, author names, and affiliation; however, the text also contains very important information for the task at hand. The presented approach achieved an overall accuracy of 96.21% on publicly available MARG dataset.

[1]  Romeyn Marc Understanding Structured Documents with a Strong Layout , 2017 .

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Marcel Worring,et al.  Logical structure detection for heterogeneous document classes , 2000, IS&T/SPIE Electronic Imaging.

[5]  Thomas M. Breuel,et al.  Structural Mixtures for Statistical Layout Analysis , 2008, 2008 The Eighth IAPR International Workshop on Document Analysis Systems.

[6]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[7]  Thomas M. Breuel,et al.  Example-Based Logical Labeling of Document Title Page Images , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[8]  Marco Aiello,et al.  Document understanding for a broad class of documents , 2002, Int. J. Document Anal. Recognit..

[9]  Abdel Belaïd,et al.  Labelling logical structures of document images using a dynamic perceptive neural network , 2011, International Journal on Document Analysis and Recognition (IJDAR).