MoDest: Multi-module Design Validation for Documents

Information extraction (IE) from Visually Rich Documents (VRDs) is a common need for businesses, where extracted information is used for various purposes such as verification, design validation, or compliance. Most of the research in IE from VRDs has focused on textual documents such as invoices and receipts, while extracting information from multi-modal VRDs remains a challenging task. This research presents a novel end-to-end design validation framework for multi-modal VRDs containing textual and visual components, for compliance against a pre-defined set of rules. The proposed Multi-mOdule DESign validaTion (referred to as MoDest) framework constitutes two steps: (i) information extraction using five modules for obtaining the textual and visual components, followed by (ii) validating the extracted components against a pre-defined set of design rules. Given an input multi-modal VRD image, the MoDest framework either accepts or rejects its design while providing an explanation for the decision. The proposed framework is tested for design validation for a particular type of VRDs: banking cards, under the real-world constraint of limited and highly imbalance training data with more than 99% of card designs belonging to one class (accepted). Experimental evaluation on real world images from our in-house dataset demonstrates the effectiveness of the proposed MoDest framework. Analysis drawn from the real-world deployment of the framework further strengthens its utility for design validation.

[1]  Scott Cohen,et al.  Deep Visual Template-Free Form Parsing , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[2]  Ritesh Sarkhel,et al.  Visual Segmentation for Information Extraction from Heterogeneous Visually Rich Documents , 2019, SIGMOD Conference.

[3]  Sandeep Tata,et al.  Representation Learning for Information Extraction from Form-like Documents , 2020, ACL.

[4]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[5]  Kanda Runapongsa Saikaew,et al.  Personal Verification System Using ID Card and Face Photo , 2019 .

[6]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Xiaojing Liu,et al.  Graph Convolution for Multimodal Information Extraction from Visually Rich Documents , 2019, NAACL.

[10]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[11]  Raymond J. Mooney,et al.  Bottom-Up Relational Learning of Pattern Matching Rules for Information Extraction , 2003, J. Mach. Learn. Res..

[12]  Vincent Poulain D'Andecy,et al.  Field Extraction by Hybrid Incremental and A-Priori Structural Templates , 2018, 2018 13th IAPR International Workshop on Document Analysis Systems (DAS).

[13]  Azadeh Nazemi,et al.  Real-time information retrieval from Identity cards , 2020, ArXiv.

[14]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Ann Nosseir,et al.  Automatic Extraction of Arabic Number from Egyptian ID Cards , 2018, ICSIE '18.