Deep Convolutional Neural Networks for Image Resolution Detection

In this paper, we present a novel approach based on convolutional neural networks (CNNs) to estimate the paper format (pixels per inch) of digitized document images. This format information is often required by commercial document analysis software. A correct estimation of format helps high-level tasks such as OCR and layout analysis. The contribution of this work is two-fold: First, it presents an algorithm for the estimation of paper formats. Second, it is the first publicly available collection of documents (aggregated from public datasets) useful as research benchmark. The collection is a mixture of modern and historical documents with a Pixel Per Inch (PPI) value range from 177 up to 711. The task is modeled as a regression task, leading to more flexible results than in a classification task (one class per format, e.g., A3, A4). For example, if an unknown format is presented to the network, it returns a useful output. Furthermore, more categories can be easily learned by curriculum learning without modifying the network structure itself. On the proposed dataset, the network is able to estimate the PPI values with only an average deviation (from the ground truth) of 14.8 PPI. On a private dataset, stemming from health insurance companies, an average deviation of 6.8 PPI points has been calculated.

[1]  Marcus Liwicki,et al.  PCA-Initialized Deep Neural Networks Applied to Document Image Analysis , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Basilios Gatos,et al.  A segmentation-free word spotting method for historical printed documents , 2016, Pattern Analysis and Applications.

[3]  Marcus Liwicki,et al.  A Generic Method for Automatic Ground Truth Generation of Camera-captured Documents , 2016, ArXiv.

[4]  Marcus Liwicki,et al.  Complete System for Text Line Extraction Using Convolutional Neural Networks and Watershed Transform , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[5]  Marcus Liwicki,et al.  Page Segmentation for Historical Document Images Based on Superpixel Classification with Unsupervised Feature Learning , 2016, 2016 12th IAPR Workshop on Document Analysis Systems (DAS).

[6]  Muhammad Zeshan Afzal,et al.  Evaluation of cursive and non-cursive scripts using recurrent neural networks , 2016, Neural Computing and Applications.

[7]  Marcus Liwicki,et al.  A sequence learning approach for multiple script identification , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[8]  Marcus Liwicki,et al.  Scale and rotation invariant OCR for Pashto cursive script using MDLSTM network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[9]  Marcus Liwicki,et al.  Document Image Binarization using LSTM: A Sequence Learning Approach , 2015, HIP@ICDAR.

[10]  Salvador España Boquera,et al.  Insights on the Use of Convolutional Neural Networks for Document Image Binarization , 2015, IWANN.

[11]  Lambert Schomaker,et al.  Towards Style-Based Dating of Historical Documents , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[12]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[13]  Angelika Garz,et al.  A Combined System for Text Line Extraction and Handwriting Recognition in Historical Documents , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[14]  Thomas M. Breuel,et al.  High-Performance OCR for Printed English and Fraktur Using LSTM Networks , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[17]  Bertin Klein,et al.  smartFIX: A Requirements-Driven System for Document Analysis and Understanding , 2002, Document Analysis Systems.

[18]  Marcus Liwicki,et al.  A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks , 2007 .

[19]  James Hartley,et al.  DESIGNING INSTRUCTIONAL AND INFORMATIONAL TEXT , 2003 .