Table Analysis and Information Extraction for Medical Laboratory Reports

Medical laboratory report is one kind of essential document for health care professionals in patient assessment, diagnosis, and long-term monitoring. Compared with paper files, electronic records are convenient for keeping up to date, complete, and accurate, which is already common in modern medical system. But the recognition from historical medical laboratory reports is still in great needs, especially in developing countries. In this paper, we present a document image processing system used for extracting information from medical laboratory reports. Given an image of medical laboratory report, its table areas and texts are firstly segmented following a top-down pipeline. Then, recognition is undergoing for every text that may contain Arabic numerals, mathematical symbols, and multilingual characters. We evaluate the system on a new dataset of medical laboratory reports that includes scanned images and camera-captured images. Our experiments demonstrate that the proposed system can effectively segment the medical document according to its layout and recognize the texts mixed with multi-type characters and symbols to obtain information from medical laboratory reports. The proposed system and the public dataset will benefit the remote healthcare in developing countries.

[1]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  R. Smith,et al.  An Overview of the Tesseract OCR Engine , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[3]  Venu Govindaraju,et al.  A Model Based Framework for Table Processing in Degraded Document Images , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[4]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[5]  Filippo Attivissimo,et al.  An automatic document processing system for medical data extraction , 2015 .

[6]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[7]  Alexander M. Rush,et al.  What You Get Is What You See: A Visual Markup Decompiler , 2016, ArXiv.

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  J. Y. Yen,et al.  Finding the K Shortest Loopless Paths in a Network , 2007 .

[12]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[13]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Marcus Herzog,et al.  Visually guided bottom-up table detection and segmentation in web documents , 2006, WWW '06.

[18]  Rafael Grompone von Gioi,et al.  LSD: A Fast Line Segment Detector with a False Detection Control , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Wenyu Liu,et al.  TextBoxes: A Fast Text Detector with a Single Deep Neural Network , 2016, AAAI.

[20]  Pan He,et al.  Detecting Text in Natural Image with Connectionist Text Proposal Network , 2016, ECCV.

[21]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Clément Chatelain,et al.  Learning to Detect Tables in Scanned Document Images Using Line Information , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[23]  Abdel Belaïd,et al.  Separator and content based approach for table extraction in handwritten chemistry documents , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[24]  Massimo Ruffolo,et al.  PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents , 2009, 2009 10th International Conference on Document Analysis and Recognition.