OPAD: An Optimized Policy-based Active Learning Framework for Document Content Analysis

Documents are central tomany business systems, and include forms, reports, contracts, invoices or purchase orders. The information in documents is typically in natural language, but can be organized in various layouts and formats. There have been recent spurt of interest in understanding document content with novel deep learning architectures. However, document understanding tasks need Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. Woodstock ’18, June 03–05, 2018, Woodstock, NY © 2018 Association for Computing Machinery. ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 https://doi.org/10.1145/1122445.1122456 dense information annotations, which are costly to scale and generalize. Several active learning techniques have been proposed to reduce the overall budget of annotation while maintaining the performance of the underlying deep learning model. However, most of these techniques work only for classification problems. But content detection is a more complex task, and has been scarcely explored in active learning literature. In this paper, we propose OPAD, a novel framework using reinforcement policy for active learning in content detection tasks for documents. The proposed framework learns the acquisition function to decide the samples to be selected while optimizing performance metrics that the tasks typically have. Furthermore, we extend to weak labelling scenarios to further reduce the cost of annotation significantly. We propose novel rewards to account for class imbalance and user feedback in the annotation interface, to improve the active learning method. We show superior performance of the proposed OPAD framework for active learning for various tasks related to document understanding like layout ar X iv :2 11 0. 02 06 9v 2 [ cs .I R ] 7 O ct 2 02 1 Woodstock ’18, June 03–05, 2018, Woodstock, NY Sumit Shekhar, Bhanu Prakash Reddy Guda, Ashutosh Chaubey, Ishan Jindal, and Avneet Jain parsing, object detection and named entity recognition. Ablation studies for human feedback and class imbalance rewards are presented, along with a comparison of annotation times for different approaches.

[1]  Joachim Denzler,et al.  Active Learning for Deep Object Detection , 2018, VISIGRAPP.

[2]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[3]  Wei Guo,et al.  An Adaptive Supervision Framework for Active Learning in Object Detection , 2019, BMVC.

[4]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[5]  Joost van de Weijer,et al.  Active Learning for Deep Detection Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Gholamreza Haffari,et al.  Learning to Actively Learn Neural Machine Translation , 2018, CoNLL.

[7]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[8]  Abhay Harpale,et al.  Document Classification Through Interactive Supervision of Document and Term Labels , 2004, PKDD.

[9]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[10]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[11]  Nicolas Audebert,et al.  Multimodal deep networks for text and image-based document classification , 2019, PKDD/ECML Workshops.

[12]  Radu Timofte,et al.  Adversarial Sampling for Active Learning , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[13]  Tom M. Mitchell,et al.  Joint Extraction of Events and Entities within a Document Context , 2016, NAACL.

[14]  Shaogang Gong,et al.  Deep Reinforcement Active Learning for Human-in-the-Loop Person Re-Identification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Zenglin Xu,et al.  A Survey on Deep Semi-Supervised Learning , 2021, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[17]  Frédéric Kaplan,et al.  dhSegment: A Generic Deep-Learning Approach for Document Segmentation , 2018, 2018 16th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[18]  David Yarowsky,et al.  Techniques in Speech Acoustics , 1999, Computational Linguistics.

[19]  Jes'us Villalba,et al.  Hierarchical Transformers for Long Document Classification , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yolande Belaïd,et al.  A Stream-Based Semi-supervised Active Learning Approach for Document Classification , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[22]  Zoubin Ghahramani,et al.  Collaborative Gaussian Processes for Preference Learning , 2012, NIPS.

[23]  Philip S. Yu,et al.  Active Learning: A Survey , 2014, Data Classification: Algorithms and Applications.

[24]  Gholamreza Haffari,et al.  Learning How to Actively Learn: A Deep Imitation Learning Approach , 2018, ACL.

[25]  Keet Sugathadasa,et al.  Legal Document Retrieval using Document Vector Embeddings and Deep Learning , 2018, Advances in Intelligent Systems and Computing.

[26]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[27]  Hongfei Lin,et al.  An attention‐based BiLSTM‐CRF approach to document‐level chemical named entity recognition , 2018, Bioinform..

[28]  Frank Keller,et al.  Training Object Class Detectors with Click Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  In So Kweon,et al.  Learning Loss for Active Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Frank Keller,et al.  We Don’t Need No Bounding-Boxes: Training Object Class Detectors Using Only Human Verification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Pedro H. O. Pinheiro,et al.  Reinforced active learning for image segmentation , 2020, ICLR.

[33]  Vinay P. Namboodiri,et al.  Deep active learning for object detection , 2018, BMVC.

[34]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[35]  Furu Wei,et al.  LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.

[36]  Joachim Denzler,et al.  Active and Incremental Learning with Weak Supervision , 2020, KI - Künstliche Intelligenz.

[37]  Furu Wei,et al.  LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding , 2020, ACL.

[38]  James T. Kwok,et al.  Generalizing from a Few Examples , 2019, ACM Comput. Surv..

[39]  Johannes Michael,et al.  A two-stage method for text line detection in historical documents , 2018, International Journal on Document Analysis and Recognition (IJDAR).

[40]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[41]  Mitchell P. Marcus,et al.  Text Chunking using Transformation-Based Learning , 1995, VLC@ACL.

[42]  Konstantinos G. Derpanis,et al.  Evaluation of deep convolutional nets for document image classification and retrieval , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[43]  Melih Kandemir,et al.  Deep Active Learning with Adaptive Acquisition , 2019, IJCAI.

[44]  Ruimao Zhang,et al.  Cost-Effective Active Learning for Deep Image Classification , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[45]  Diane J. Cook,et al.  A Survey of Unsupervised Deep Domain Adaptation , 2018, ACM Trans. Intell. Syst. Technol..

[46]  Anima Anandkumar,et al.  Deep Active Learning for Named Entity Recognition , 2017, Rep4NLP@ACL.

[47]  Curtis Wigington,et al.  Multimodal Document Image Classification , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[48]  Joachim Denzler,et al.  Selecting Influential Examples: Active Learning with Expected Model Output Changes , 2014, ECCV.

[49]  Brian D. Davison,et al.  Neural Ranking Models for Document Retrieval , 2021, Inf. Retr. J..

[50]  Frédéric Precioso,et al.  Adversarial Active Learning for Deep Networks: a Margin Based Approach , 2018, ArXiv.

[51]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[52]  Dominika Tkaczyk,et al.  GROTOAP2 - The Methodology of Creating a Large Ground Truth Dataset of Scientific Articles , 2014, D Lib Mag..

[53]  Dale Schuurmans,et al.  Discriminative Batch Mode Active Learning , 2007, NIPS.

[54]  Jimmy J. Lin,et al.  DocBERT: BERT for Document Classification , 2019, ArXiv.

[55]  Yuan Li,et al.  Learning how to Active Learn: A Deep Reinforcement Learning Approach , 2017, EMNLP.

[56]  Shlomo Argamon,et al.  Building a test collection for complex document information processing , 2006, SIGIR.

[57]  Dan Wang,et al.  A new active labeling method for deep learning , 2014, 2014 International Joint Conference on Neural Networks (IJCNN).

[58]  Antonio Jimeno-Yepes,et al.  PubLayNet: Largest Dataset Ever for Document Layout Analysis , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[59]  Gerard P. Learmonth,et al.  Document Retrieval Using Deep Learning , 2020, 2020 Systems and Information Engineering Design Symposium (SIEDS).