Recognizing Challenging Handwritten Annotations with Fully Convolutional Networks

This paper introduces a very challenging dataset of historic German documents and evaluates Fully Convolutional Neural Network (FCNN) based methods to locate handwritten annotations of any kind in these documents. The handwritten annotations can appear in form of underlines and text by using various writing instruments, e.g., the use of pencils makes the data more challenging. We train and evaluate various end-to-end semantic segmentation approaches and report the results. The task is to classify the pixels of documents into two classes: background and handwritten annotation. The best model achieves a mean Intersection over Union (IOU) score of 95.6% on the test documents of the presented dataset. We also present a comparison of different strategies used for data augmentation and training on our presented dataset. For evaluation, we use the Layout Analysis Evaluator for the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts.

[1]  Marcus Liwicki,et al.  Open Evaluation Tool for Layout Analysis of Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[2]  Muhammad Imran Razzak,et al.  Evaluation of cursive and non-cursive scripts using recurrent neural networks , 2015, Neural Computing and Applications.

[3]  Apostolos Antonacopoulos,et al.  The IMPACT dataset of historical document images , 2013, HIP '13.

[4]  Ersin Yumer,et al.  Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Marcus Liwicki,et al.  Document Image Binarization using LSTM: A Sequence Learning Approach , 2015, HIP@ICDAR.

[6]  Karel Driesen,et al.  Sequence-to-Label Script Identification for Multilingual OCR , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Volkmar Frinken,et al.  Latest Developments of LSTM Neural Networks with Applications of Document Image Analysis , 2016, Handbook of Pattern Recognition and Computer Vision.

[10]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[11]  Angelika Garz,et al.  DIVA-HisDB: A Precisely Annotated Large Dataset of Challenging Medieval Manuscripts , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[12]  Alex Graves,et al.  Supervised Sequence Labelling with Recurrent Neural Networks , 2012, Studies in Computational Intelligence.

[13]  Apostolos Antonacopoulos,et al.  The PAGE (Page Analysis and Ground-Truth Elements) Format Framework , 2010, 2010 20th International Conference on Pattern Recognition.

[14]  Salvador España Boquera,et al.  Insights on the Use of Convolutional Neural Networks for Document Image Binarization , 2015, IWANN.

[15]  Marcus Liwicki,et al.  Page segmentation of historical document images with convolutional autoencoders , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Yue Xu,et al.  Page Segmentation for Historical Handwritten Documents Using Fully Convolutional Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[17]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[18]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Marcus Liwicki,et al.  Deep Convolutional Neural Networks for Image Resolution Detection , 2017, HIP@ICDAR.

[20]  Marcus Liwicki,et al.  Multilevel Context Representation for Improving Object Recognition , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[21]  Marcus Liwicki,et al.  Robust Text Line Segmentation for Historical Manuscript Images Using Color and Texture , 2014, 2014 22nd International Conference on Pattern Recognition.

[22]  Paul Lukowicz,et al.  D-StaR: A Generic Method for Stamp Segmentation from Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[23]  Marcus Liwicki,et al.  A sequence learning approach for multiple script identification , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[24]  Marcus Liwicki,et al.  Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[25]  Kai Chen,et al.  Convolutional Neural Networks for Page Segmentation of Historical Document Images , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[26]  Anil K. Jain,et al.  Locating text in complex color images , 1995, Pattern Recognit..

[27]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[28]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[29]  Marcus Liwicki,et al.  Deepdocclassifier: Document classification with deep Convolutional Neural Network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[30]  Jihad El-Sana,et al.  Simplifying the reading of historical manuscripts , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[31]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[33]  Alex Graves,et al.  Supervised Sequence Labelling , 2012 .

[34]  Apostolos Antonacopoulos,et al.  The ENP image and ground truth dataset of historical newspapers , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[35]  Marcus Liwicki,et al.  ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[36]  Marcus Liwicki,et al.  Scale and rotation invariant OCR for Pashto cursive script using MDLSTM network , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[37]  Chris Tensmeyer,et al.  Document Image Binarization with Fully Convolutional Neural Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[38]  Marcus Liwicki,et al.  Real-Time Document Image Classification Using Deep CNN and Extreme Learning Machines , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[39]  Nikos Papamarkos,et al.  Distinction between handwritten and machine-printed text based on the bag of visual words model , 2014, Pattern Recognit..

[40]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).