Automated Digitization of Unstructured Medical Prescriptions

Automated digitization of prescription images is a critical prerequisite to scale digital healthcare services such as online pharmacies. This is challenging in emerging markets since prescriptions are not digitized at source and patients lack the medical expertise to interpret prescriptions to place orders. In this paper, we present prescription digitization system for online medicine ordering built with minimal supervision. Our system uses a modular pipeline comprising a mix of ML and rule-based components for (a) image to text extraction, (b) segmentation into blocks and medication items, (c) medication attribute extraction, (d) matching against medicine catalog, and (e) shopping cart building. Our approach efficiently utilizes multiple signals like layout, medical ontologies, and semantic embeddings via LayoutLMv2 model to yield substantial improvement relative to strong baselines on medication attribute extraction. Our pipeline achieves +5.9% gain in precision@3 and +5.6% in recall@3 over catalog-based fuzzy matching baseline for shopping cart building for printed prescriptions.

[1]  Ali Can Kocabiyikoglu,et al.  Neural Medication Extraction: A Comparison of Recent Models in Supervised and Semi-supervised Learning Settings , 2021, 2021 IEEE 9th International Conference on Healthcare Informatics (ICHI).

[2]  Cha Zhang,et al.  LayoutLMv2: Multi-modal Pre-training for Visually-rich Document Understanding , 2020, ACL.

[3]  Byron L. D. Bezerra,et al.  HTR-Flor: A Deep Learning System for Offline Handwritten Text Recognition , 2020, 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI).

[4]  Douglas W. Oard,et al.  A Joint Model for Document Segmentation and Segment Labeling , 2020, ACL.

[5]  Sandeep Tata,et al.  Representation Learning for Information Extraction from Form-like Documents , 2020, ACL.

[6]  Ziqian Xie,et al.  Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction , 2020, npj Digital Medicine.

[7]  Yufeng Zhang,et al.  Every Document Owns Its Structure: Inductive Text Classification via Graph Neural Networks , 2020, ACL.

[8]  Isabel Kayu Metzger,et al.  Assessment of Amazon Comprehend Medical: Medication Information Extraction , 2020, ArXiv.

[9]  Furu Wei,et al.  LayoutLM: Pre-training of Text and Layout for Document Image Understanding , 2019, KDD.

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  Amazon Rekognition , 2019, Machine Learning in the AWS Cloud.

[12]  S. Bayer,et al.  An annotation and modeling schema for prescription regimens , 2019, J. Biomed. Semant..

[13]  Xiaojing Liu,et al.  Graph Convolution for Multimodal Information Extraction from Visually Rich Documents , 2019, NAACL.

[14]  Steffen Bickel,et al.  Chargrid: Towards Understanding 2D Documents , 2018, EMNLP.

[15]  Yue Zhang,et al.  Joint Extraction of Entities and Relations Based on a Novel Graph Scheme , 2018, IJCAI.

[16]  Özlem Uzuner,et al.  Prescription extraction using CRFs and word embeddings , 2017, J. Biomed. Informatics.

[17]  Ole Winther,et al.  End-to-End Information Extraction without Token-Level Supervision , 2017, SCNLP@EMNLP 2017.

[18]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[19]  Eric Medvet,et al.  A probabilistic approach to printed document understanding , 2011, International Journal on Document Analysis and Recognition (IJDAR).

[20]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[21]  Min Li,et al.  High accuracy information extraction of medication information from clinical notes: 2009 i2b2 medication extraction challenge , 2010, J. Am. Medical Informatics Assoc..

[22]  Son Doan,et al.  Integrating existing natural language processing tools for medication extraction from discharge summaries , 2010, J. Am. Medical Informatics Assoc..

[23]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[24]  Robert Sabourin,et al.  Large vocabulary off-line handwriting recognition: A survey , 2003, Pattern Analysis & Applications.

[25]  Jiawei Han,et al.  ChemNER: Fine-Grained Chemistry Named Entity Recognition with Ontology-Guided Distant Supervision , 2021, EMNLP.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  T. Munich,et al.  Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks , 2008, NIPS.