Using Language Models to Identify Relevant New Information in Inpatient Clinical Note

Redundant information in clinical notes within electronic health record (EHR) systems is ubiquitous and may negatively impact the use of these notes by clinicians, and, potentially, the efficiency of patient care delivery. Automated methods to identify redundant versus relevant new information may provide a valuable tool for clinicians to better synthesize patient information and navigate to clinically important details. In this study, we investigated the use of language models for identification of new information in inpatient notes, and evaluated our methods using expert-derived reference standards. The best method achieved precision of 0.743, recall of 0.832 and F1-measure of 0.784. The average proportion of redundant information was similar between inpatient and outpatient progress notes (76.6% (SD=17.3%) and 76.7% (SD=14.0%), respectively). Advanced practice providers tended to have higher rates of redundancy in their notes compared to physicians. Future investigation includes the addition of semantic components and visualization of new information.