Extracting Medical Information from Paper COVID-19 Assessment Forms

OBJECTIVE  This study examines the validity of optical mark recognition, a novel user interface, and crowdsourced data validation to rapidly digitize and extract data from paper COVID-19 assessment forms at a large medical center. METHODS  An optical mark recognition/optical character recognition (OMR/OCR) system was developed to identify fields that were selected on 2,814 paper assessment forms, each with 141 fields which were used to assess potential COVID-19 infections. A novel user interface (UI) displayed mirrored forms showing the scanned assessment forms with OMR results superimposed on the left and an editable web form on the right to improve ease of data validation. Crowdsourced participants validated the results of the OMR system. Overall error rate and time taken to validate were calculated. A subset of forms was validated by multiple participants to calculate agreement between participants. RESULTS  The OMR/OCR tools correctly extracted data from scanned forms fields with an average accuracy of 70% and median accuracy of 78% when the OMR/OCR results were compared with the results from crowd validation. Scanned forms were crowd-validated at a mean rate of 157 seconds per document and a volume of approximately 108 documents per day. A randomly selected subset of documents was reviewed by multiple participants, producing an interobserver agreement of 97% for documents when narrative-text fields were included and 98% when only Boolean and multiple-choice fields were considered. CONCLUSION  Due to the COVID-19 pandemic, it may be challenging for health care workers wearing personal protective equipment to interact with electronic health records. The combination of OMR/OCR technology, a novel UI, and crowdsourcing data-validation processes allowed for the efficient extraction of a large volume of paper medical documents produced during the COVID-19 pandemic.

[1]  Clement J. McDonald,et al.  Development and Implementation of a Computerized Clinical Laboratory System , 1976 .

[2]  Nigam H. Shah,et al.  The coming age of data-driven medicine: translational bioinformatics' next frontier , 2012, J. Am. Medical Informatics Assoc..

[3]  B P Bergeron,et al.  Optical mark recognition. Tallying information from filled-in 'bubbles'. , 1998, Postgraduate medicine.

[4]  Goran Nenadic,et al.  Clinical Text Data in Machine Learning: Systematic Review , 2020, JMIR medical informatics.

[5]  Kathy Leung,et al.  Crowdsourcing data to mitigate epidemics , 2020, The Lancet Digital Health.

[6]  H. Bussmann Hybrid data capture approach for monitoring HAART patients , 2006 .

[7]  Paul A. Harris,et al.  The REDCap consortium: Building an international community of software platform partners , 2019, J. Biomed. Informatics.

[8]  Nutchanat Sattayakawee,et al.  Test Scoring for Non-Optical Grid Answer Sheet Based on Projection Profile Method , 2013 .

[9]  Yevgeniy Vorobeychik,et al.  A Crowdsourcing Framework for Medical Data Sets , 2018, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[10]  Vibha Anand,et al.  Automated Primary Care Screening in Pediatric Waiting Rooms , 2012, Pediatrics.

[11]  Dylan H. Morris,et al.  Aerosol and Surface Stability of SARS-CoV-2 as Compared with SARS-CoV-1 , 2020, The New England journal of medicine.

[12]  Matthew Fifolt,et al.  Man Versus Machine: Comparing Double Data Entry and Optical Mark Recognition for Processing CAHPS Survey Data , 2017, Quality management in health care.

[13]  A. Chughtai,et al.  Current knowledge of COVID-19 and infection prevention and control strategies in healthcare settings: A global analysis , 2020, Infection Control & Hospital Epidemiology.

[14]  Vibha Anand,et al.  Human and System Errors, Using Adaptive Turnaround Documents to Capture Data in a Busy Practice , 2005, AMIA.

[15]  M. Collen Clinical research databases—A historical review , 1990, Journal of Medical Systems.

[16]  R N Shiffman,et al.  Transition to a computer-based record using scannable, structured encounter forms. , 1997, Archives of pediatrics & adolescent medicine.

[17]  S Trent Rosenbloom,et al.  Rapid development of telehealth capabilities within pediatric patient portal infrastructure for COVID-19 care: barriers, solutions, results , 2020, J. Am. Medical Informatics Assoc..

[18]  J. Marc Overhage,et al.  A modern optical character recognition system in a real world clinical setting: some accuracy and feasibility observations , 2002, AMIA.

[19]  P. Palange,et al.  Protecting healthcare workers from SARS-CoV-2 infection: practical indications , 2020, European Respiratory Review.

[20]  Roadblocks to Infection Prevention Efforts in Health Care: SARS-CoV-2/COVID-19 Response , 2020, Disaster Medicine and Public Health Preparedness.

[21]  Ji Yong Lee,et al.  Walk-Through Screening Center for COVID-19: an Accessible and Efficient Screening System in a Pandemic Situation , 2020, Journal of Korean medical science.

[22]  Vibha Anand,et al.  Using Adaptive Turnaround Documents to Electronically Acquire Structured Data in Clinical Settings , 2003, AMIA.

[23]  Seng Cheong Loke,et al.  A new method of mark detection for software-based optical mark recognition , 2018, PloS one.

[24]  Yasuo Ohashi,et al.  A comparison of error detection rates between the reading aloud method and the double data entry method. , 2003, Controlled clinical trials.

[25]  P. Harris,et al.  Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support , 2009, J. Biomed. Informatics.

[26]  Søren Overgaard,et al.  Quality of Data Entry Using Single Entry, Double Entry and Automated Forms Processing–An Example Based on a Study of Patient-Reported Outcomes , 2012, PloS one.

[27]  G Titlestad Use of document image processing in cancer registration: how and why? , 1995, Medinfo. MEDINFO.