Topic Modeling to Discern Irregular Order Patterns in Unlabeled Electronic Health Records

This paper presents an application of topic modeling on event sequences of Electronic Health Record (EHR) orders. Metaphorically, we approach clinical order event sequences of unlabeled data as if they are documents where words are the events that occurred in the history of an order. We demonstrate the approach leveraging Consult orders data. The details of the data preprocessing and the data structures are described along with the data sources. Latent Dirichlet Allocation (LDA) is leveraged to fit against the limited datasets prepared. Another open source tool - LDAvis is used for exploratory analysis of the LDA results. The preliminary results revealed some order patterns that are qualitatively evaluated as potential irregular transitions. The goal of this analysis is to provide unsupervised learning application to domain experts in the absence of labeled data where they can investigate captured patterns and identify irregular transitions of orders. Ultimately, such efforts will guide formalization of hazard detection algorithms that monitor EHR data to identify health information technology related hazards.