Clinical Documents Clustering Based on Medication/Symptom Names Using Multi-View Nonnegative Matrix Factorization

Clinical documents are rich free-text data sources containing valuable medication and symptom information, which have a great potential to improve health care. In this paper, we build an integrating system for extracting medication names and symptom names from clinical notes. Then we apply nonnegative matrix factorization (NMF) and multi-view NMF to cluster clinical notes into meaningful clusters based on sample-feature matrices. Our experimental results show that multi-view NMF is a preferable method for clinical document clustering. Moreover, we find that using extracted medication/symptom names to cluster clinical documents outperforms just using words.

[1]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[2]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[4]  Christian Bauckhage,et al.  Non-negative Matrix Factorization in Multimodality Data for Segmentation and Label Prediction , 2011 .

[5]  Aron Henriksson,et al.  Semantic Spaces of Clinical Text : Leveraging Distributional Semantics for Natural Language Processing of Electronic Health Records , 2013 .

[6]  Olga Patterson,et al.  Document clustering of clinical narratives: a systematic study of clinical sublanguages. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[7]  Özlem Uzuner,et al.  Extracting medication information from clinical text , 2010, J. Am. Medical Informatics Assoc..

[8]  Anna Goldenberg,et al.  EquiNMF: Graph Regularized Multiview Nonnegative Matrix Factorization , 2014, ArXiv.

[9]  Michael K. Ng,et al.  Medical Document Clustering Using Ontology-Based Term Similarity Measures , 2008, Int. J. Data Warehous. Min..

[10]  Mi-Young Kim,et al.  Patient information extraction in noisy tele-health texts , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[11]  Beatriz de la Iglesia,et al.  A Comparison of Two Document Clustering Approaches for Clustering Medical Documents , 2006, DMIN.

[12]  Olga Patterson,et al.  Document sublanguage clustering to detect medical specialty in cross-institutional clinical texts , 2013, DTMBIO '13.

[13]  George Hripcsak,et al.  Mining complex clinical data for patient safety research: a framework for event discovery , 2003, J. Biomed. Informatics.

[14]  Wei Yuan,et al.  Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization , 2011, Inf. Sci..

[15]  Xiaohua Hu,et al.  A matching framework for modeling symptom and medication relationships from clinical notes , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[16]  Özlem Uzuner,et al.  Annotating risk factors for heart disease in clinical narratives for diabetic patients , 2015, J. Biomed. Informatics.

[17]  Özlem Uzuner,et al.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks , 2015, J. Biomed. Informatics.

[18]  Son Doan,et al.  Application of information technology: MedEx: a medication information extraction system for clinical narratives , 2010, J. Am. Medical Informatics Assoc..

[19]  Søren Brunak,et al.  Using Electronic Patient Records to Discover Disease Correlations and Stratify Patient Cohorts , 2011, PLoS Comput. Biol..

[20]  Prakash M. Nadkarni,et al.  Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions , 2011, J. Am. Medical Informatics Assoc..

[21]  Sanda M. Harabagiu,et al.  A flexible framework for deriving assertions from electronic medical records , 2011, J. Am. Medical Informatics Assoc..

[22]  Jimeng Sun,et al.  SympGraph: a framework for mining clinical notes through symptom relation graphs , 2012, KDD.

[23]  Christopher G. Chute,et al.  Maximum entropy modeling for mining patient medication status from free text , 2002, AMIA.

[24]  Jinwook Choi,et al.  Effect of Latent Semantic Indexing for Clustering Clinical Documents , 2010, 2010 IEEE/ACIS 9th International Conference on Computer and Information Science.

[25]  Jiawei Han,et al.  Multi-View Clustering via Joint Nonnegative Matrix Factorization , 2013, SDM.