Automated Categorization of Systemic Disease and Duration From Electronic Medical Record System Data Using Finite-State Machine Modeling: Prospective Validation Study

Background One of the major challenges in the health care sector is that approximately 80% of generated data remains unstructured and unused. Since it is difficult to handle unstructured data from electronic medical record systems, it tends to be neglected for analyses in most hospitals and medical centers. Therefore, there is a need to analyze unstructured big data in health care systems so that we can optimally utilize and unearth all unexploited information from it. Objective In this study, we aimed to extract a list of diseases and associated keywords along with the corresponding time durations from an indigenously developed electronic medical record system and describe the possibility of analytics from the acquired datasets. Methods We propose a novel, finite-state machine to sequentially detect and cluster disease names from patients’ medical history. We defined 3 states in the finite-state machine and transition matrix, which depend on the identified keyword. In addition, we also defined a state-change action matrix, which is essentially an action associated with each transition. The dataset used in this study was obtained from an indigenously developed electronic medical record system called eyeSmart that was implemented across a large, multitier ophthalmology network in India. The dataset included patients’ past medical history and contained records of 10,000 distinct patients. Results We extracted disease names and associated keywords by using the finite-state machine with an accuracy of 95%, sensitivity of 94.9%, and positive predictive value of 100%. For the extraction of the duration of disease, the machine’s accuracy was 93%, sensitivity was 92.9%, and the positive predictive value was 100%. Conclusions We demonstrated that the finite-state machine we developed in this study can be used to accurately identify disease names, associated keywords, and time durations from a large cohort of patient records obtained using an electronic medical record system.

[1]  I. Halcu,et al.  Converting unstructured and semi-structured data into knowledge , 2013, 2013 11th RoEduNet International Conference.

[2]  Zhaopeng Xu,et al.  A Pattern-Based Method for Medical Entity Recognition From Chinese Diagnostic Imaging Text , 2019, Front. Artif. Intell..

[3]  A. Das,et al.  Big data and the eyeSmart electronic medical record system - An 8-year experience from a three-tier eye care network in India , 2020, Indian journal of ophthalmology.

[4]  Hyoun-Joong Kong,et al.  Managing Unstructured Big Data in Healthcare System , 2019, Healthcare informatics research.

[5]  Donia Scott,et al.  Extracting information from the text of electronic medical records to improve case detection: a systematic review , 2016, J. Am. Medical Informatics Assoc..

[6]  Philip E. Bourne,et al.  Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review , 2019, J. Am. Medical Informatics Assoc..

[7]  Tammy Chang,et al.  Augmenting Qualitative Text Analysis with Natural Language Processing: Methodological Study , 2018, Journal of medical Internet research.

[8]  Devore S. Culver,et al.  Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing–Based Algorithm With Statewide Electronic Medical Records , 2016, JMIR medical informatics.

[9]  Fang Liu,et al.  Data Processing and Text Mining Technologies on Electronic Medical Records: A Review , 2018, Journal of healthcare engineering.

[10]  Hsinchun Chen,et al.  A shallow parser based on closed-class words to capture relations in biomedical text , 2003, J. Biomed. Informatics.

[11]  Muhammad Mamdani,et al.  Extracting Clinical Features From Dictated Ambulatory Consult Notes Using a Commercially Available Natural Language Processing Tool: Pilot, Retrospective, Cross-Sectional Validation Study , 2019, JMIR medical informatics.

[12]  Roberto Gallego-Pinazo,et al.  Eclectic Ocular Comorbidities and Systemic Diseases with Eye Involvement: A Review , 2016, BioMed research international.

[13]  Barbara Sheehan,et al.  Natural Language Processing–Enabled and Conventional Data Capture Methods for Input to Electronic Health Records: A Comparative Usability Study , 2016, JMIR medical informatics.

[14]  Douglas E. Appelt,et al.  FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[15]  L. Iezzoni,et al.  Challenges of Developing a Natural Language Processing Method with Electronic Health Records to Identify Persons with Chronic Mobility Disability. , 2020, Archives of physical medicine and rehabilitation.

[16]  Anita Burgun-Parenthoine,et al.  Using regular expressions to extract information on pacemaker implantation procedures from clinical reports , 2008, AMIA.

[17]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[18]  Mark Dredze,et al.  Extraction of Geriatric Syndromes From Electronic Health Record Clinical Notes: Assessment of Statistical Natural Language Processing Methods , 2018, JMIR medical informatics.

[19]  Qing Zeng-Treitler,et al.  Regular expression-based learning to extract bodyweight values from clinical notes , 2015, J. Biomed. Informatics.