Chargrid-OCR: End-to-end trainable Optical Character Recognition through Semantic Segmentation and Object Detection

We present an end-to-end trainable approach for optical character recognition (OCR) on printed documents. It is based on predicting a two-dimensional character grid ("chargrid") representation of a document image as a semantic segmentation task. To identify individual character instances from the chargrid, we regard characters as objects and use object detection techniques from computer vision. We demonstrate experimentally that our method outperforms previous state-of-the-art approaches in accuracy while being easily parallelizable on GPU (thereby being significantly faster), as well as easier to train.

[1]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[2]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[3]  Jiri Matas,et al.  Deep TextSpotter: An End-to-End Trainable Scene Text Localization and Recognition Framework , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Thomas M. Breuel,et al.  High Performance Text Recognition Using a Hybrid Convolutional-LSTM Implementation , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[5]  Thomas M. Breuel,et al.  Robust, Simple Page Segmentation Using Hybrid Convolutional MDLSTM Networks , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[6]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Steffen Bickel,et al.  Chargrid: Towards Understanding 2D Documents , 2018, EMNLP.

[8]  Thomas Kieninger,et al.  An open approach towards the benchmarking of table structure recognition systems , 2010, DAS '10.

[9]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Stephen V. Rice,et al.  The Fourth Annual Test of OCR Accuracy , 1995 .

[11]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.