Deep Visual Template-Free Form Parsing

Automatic, template-free extraction of information from form images is challenging due to the variety of form layouts. This is even more challenging for historical forms due to noise and degradation. A crucial part of the extraction process is associating input text with pre-printed labels. We present a learned, template-free solution to detecting pre-printed text and input text/handwriting and predicting pair-wise relationships between them. While previous approaches to this problem have been focused on clean images and clear layouts, we show our approach is effective in the domain of noisy, degraded, and varied form images. We introduce a new dataset of historical form images (late 1800s, early 1900s) for training and validating our approach. Our method uses a convolutional network to detect pre-printed text and input text lines. We pool features from the detection network to classify possible relationships in a language-agnostic way. We show that our proposed pairing method outperforms heuristic rules and that visual features are critical to obtaining high accuracy.

[1]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[2]  William A. Barrett,et al.  Digital mountain: from granite archive to global access , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[3]  Muheet Ahmed Butt Information Extraction from Pre-Preprinted Documents , 2014 .

[4]  Stefan Lee,et al.  Graph R-CNN for Scene Graph Generation , 2018, ECCV.

[5]  Hongming Cai,et al.  iRMP: From Printed Forms to Relational Data Model , 2016, 2016 IEEE 18th International Conference on High Performance Computing and Communications; IEEE 14th International Conference on Smart City; IEEE 2nd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[6]  Stephen P. Boyd,et al.  ECOS: An SOCP solver for embedded systems , 2013, 2013 European Control Conference (ECC).

[7]  Johannes Michael,et al.  A two-stage method for text line detection in historical documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[8]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[9]  William A. Barrett,et al.  Start, Follow, Read: End-to-End Full-Page Handwriting Recognition , 2018, ECCV.

[10]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Laurence Likforman-Sulem,et al.  Text line segmentation of historical documents: a survey , 2007, International Journal of Document Analysis and Recognition (IJDAR).

[13]  Ji Zhang,et al.  Relationship Proposal Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  In-So Kweon,et al.  LinkNet: Relational Embedding for Scene Graph , 2018, NeurIPS.

[15]  Vincent Poulain D'Andecy,et al.  One-shot field spotting on colored forms using subgraph isomorphism , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[16]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Takeshi Nagasaki,et al.  Development of Template-Free Form Recognition System , 2011, 2011 International Conference on Document Analysis and Recognition.

[21]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[22]  Joan Puigcerver,et al.  Are Multidimensional Recurrent Layers Really Necessary for Handwritten Text Recognition? , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).