Identifying patterns of associated-conditions through topic models of Electronic Medical Records

Multiple adverse health conditions co-occurring in a patient are typically associated with poor prognosis and increased office or hospital visits. Developing methods to identify patterns of co-occurring conditions can assist in diagnosis. Thus, identifying patterns of association among co-occurring conditions is of growing interest. In this paper, we report preliminary results from a data-driven study, in which we apply a machine learning method, namely, topic modeling, to Electronic Medical Records (EMRs), aiming to identify patterns of associated conditions. Specifically, we use the well-established Latent Dirichlet Allocation (LDA), a method based on the idea that documents can be modeled as a mixture of latent topics, where each topic is a distribution over words. In our study, we adapt the LDA model to identify latent topics in patients' EMRs. We evaluate the performance of our method both qualitatively and quantitatively, and show that the obtained topics indeed align well with distinct medical phenomena characterized by co-occurring conditions.

[1]  H. Halkin,et al.  Hyperinsulinemia. A link between hypertension obesity and glucose intolerance. , 1985, The Journal of clinical investigation.

[2]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[3]  David J. Spiegelhalter,et al.  Introducing Markov chain Monte Carlo , 1995 .

[4]  B. Zimmerman,et al.  Hyperlipidemia and diabetes mellitus. , 1998, Mayo Clinic proceedings.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  B. Kasiske,et al.  Excerpts from the United States Renal Data System 2006 Annual Data Report. , 2007, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[7]  B. Kasiske,et al.  United States Renal Data System 2006 Annual Data Report Abstract , 2007 .

[8]  D. Margolis,et al.  Association Between Renal Failure and Foot Ulcer or Lower-Extremity Amputation in Patients With Diabetes , 2008, Diabetes Care.

[9]  Yang Qiu,et al.  United States Renal Data System 2008 Annual Data Report. , 2009, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[10]  Yansong Feng,et al.  Topic Models for Image Annotation and Text Illustration , 2010, HLT-NAACL.

[11]  Alex A T Bui,et al.  Clinical Case-based Retrieval Using Latent Topic Analysis. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[12]  Mohammed Saeed,et al.  Risk Stratification of ICU Patients Using Topic Models Inferred from Unstructured Progress Notes , 2012, AMIA.

[13]  Hua Xu,et al.  Ranking Gene-Drug Relationships in Biomedical Literature Using Latent Dirichlet Allocation , 2011, Pacific Symposium on Biocomputing.

[14]  Saeid Nahavandi,et al.  Biomedical time series clustering based on non-negative sparse coding and probabilistic topic model , 2013, Comput. Methods Programs Biomed..

[15]  Matthew Purver,et al.  Investigating Topic Modelling for Therapy Dialogue Analysis , 2013 .

[16]  Hongfang Liu,et al.  Discovering Associations Among Diagnosis Groups Using Topic Modeling , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[17]  Anna Rumshisky,et al.  Unfolding physiological state: mortality modelling in intensive care units , 2014, KDD.

[18]  Leonardo Max Batista Claudino,et al.  Beyond LDA: Exploring Supervised Topic Modeling for Depression-Related Language in Twitter , 2015, CLPsych@HLT-NAACL.