Medical prescription classification: a NLP-based approach

The digitization of healthcare data has been consolidated in the last decade as a must to manage the vast amount of data generated by healthcare organizations. Carrying out this process effectively represents an enabling resource that will improve healthcare services provision, as well as on-the-edge related applications, ranging from clinical text mining to predictive modelling, survival analysis, patient similarity, genetic data analysis and many others. The application presented in this work concerns the digitization of medical prescriptions, both to provide authorization for healthcare services or to grant reimbursement for medical expenses. The proposed system first extract text from scanned medical prescription, then Natural Language Processing and machine learning techniques provide effective classification exploiting embedded terms and categories about patient/doctor personal data, symptoms, pathology, diagnosis and suggested treatments. A REST ful Web Service is introduced, together with results of prescription classification over a set of 800K+ of diagnostic statements.

[1]  W. Bieniecki,et al.  Image Preprocessing for Improving OCR Accuracy , 2007, 2007 International Conference on Perspective Technologies and Methods in MEMS Design.

[2]  Vincenza Carchiolo,et al.  Using Twitter Data and Sentiment Analysis to Study Diseases Dynamics , 2015, ITBAM.

[3]  Fred J. Damerau,et al.  A technique for computer detection and correction of spelling errors , 1964, CACM.

[4]  Yuqi Si,et al.  A Frame-Based NLP System for Cancer-Related Information Extraction , 2018, AMIA.

[5]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[6]  Bram van Ginneken,et al.  A survey on deep learning in medical image analysis , 2017, Medical Image Anal..

[7]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[8]  Son Doan,et al.  Using Natural Language Processing to Extract Health-Related Causality from Twitter Messages , 2018, 2018 IEEE International Conference on Healthcare Informatics Workshop (ICHI-W).

[9]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[10]  Parisa Rashidi,et al.  Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis , 2017, IEEE Journal of Biomedical and Health Informatics.

[11]  Su Jian,et al.  An Image Enhancement Method Based on Gamma Correction , 2009, 2009 Second International Symposium on Computational Intelligence and Design.

[12]  Shuchang Zhou,et al.  EAST: An Efficient and Accurate Scene Text Detector , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Farhana Ahad,et al.  Information hiding in medical images: a robust medical image watermarking system for E-healthcare , 2017, Multimedia Tools and Applications.

[14]  Vincenza Carchiolo,et al.  Multisource agent-based healthcare data gathering , 2015, 2015 Federated Conference on Computer Science and Information Systems (FedCSIS).

[15]  Mark D. McDonnell,et al.  Understanding Data Augmentation for Classification: When to Warp? , 2016, 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA).