Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity

BackgroundMany clinical concepts are standardized under a categorical and hierarchical taxonomy such as ICD-10, ATC, etc. These taxonomic clinical concepts provide insight into semantic meaning and similarity among clinical concepts and have been applied to patient similarity measures. However, the effects of diverse set sizes of taxonomic clinical concepts contributing to similarity at the patient level have not been well studied.MethodsIn this paper the most widely used taxonomic clinical concepts system, ICD-10, was studied as a representative taxonomy. The distance between ICD-10-coded diagnosis sets is an integrated estimation of the information content of each concept, the similarity between each pairwise concepts and the similarity between the sets of concepts. We proposed a novel method at the set-level similarity to calculate the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity. A real-world clinical dataset with ICD-10 coded diagnoses and hospital length of stay (HLOS) information was used to evaluate the performance of various algorithms and their combinations in predicting whether a patient need long-term hospitalization or not. Four subpopulation prototypes that were defined based on age and HLOS with different diagnoses set sizes were used as the target for similarity analysis. The F-score was used to evaluate the performance of different algorithms by controlling other factors. We also evaluated the effect of prototype set size on prediction precision.ResultsThe results identified the strengths and weaknesses of different algorithms to compute information content, code-level similarity and set-level similarity under different contexts, such as set size and concept set background. The minimum weighted bipartite matching approach, which has not been fully recognized previously showed unique advantages in measuring the concepts-based patient similarity.ConclusionsThis study provides a systematic benchmark evaluation of previous algorithms and novel algorithms used in taxonomic concepts-based patient similarity, and it provides the basis for selecting appropriate methods under different clinical scenarios.

[1]  Oguz Dikenelli,et al.  Prediction of Drug-Drug Interactions Using Pharmacological Similarities of Drugs , 2015, 2015 26th International Workshop on Database and Expert Systems Applications (DEXA).

[2]  David Sánchez,et al.  A framework for unifying ontology-based semantic similarity measures: A study in the biomedical domain , 2014, J. Biomed. Informatics.

[3]  Ulrik Brandes,et al.  An Experimental Study on Distance-Based Graph Drawing , 2009, GD.

[4]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[5]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[6]  Benjamin S. Glicksberg,et al.  Identification of type 2 diabetes subgroups through topological analysis of patient similarity , 2015, Science Translational Medicine.

[7]  Daniela Zaharie,et al.  Taxonomy-based dissimilarity measures for profile identification in medical data , 2015, 2015 IEEE 13th International Symposium on Intelligent Systems and Informatics (SISY).

[8]  Stephan Dreiseitl,et al.  Using concept hierarchies to improve calculation of patient similarity , 2016, J. Biomed. Informatics.

[9]  Jianying Hu,et al.  Towards Personalized Medicine: Leveraging Patient Similarity and Drug Similarity Analytics , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[10]  Jyotishman Pathak,et al.  Using EHRs for Heart Failure Therapy Recommendation Using Multidimensional Patient Similarity Analytics , 2016, MIE.

[11]  Fei Wang,et al.  PSF: A Unified Patient Similarity Evaluation Framework Through Metric Learning With Weak Supervision , 2015, IEEE Journal of Biomedical and Health Informatics.

[12]  Giuseppe Pirrò,et al.  A semantic similarity metric combining features and intrinsic information content , 2009, Data Knowl. Eng..

[13]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[14]  David Sánchez,et al.  Ontology-based information content computation , 2011, Knowl. Based Syst..

[15]  Olivier Dameron,et al.  A Similarity Measure Based on Care Trajectories as Sequences of Sets , 2017, AIME.

[16]  Darcy A. Davis,et al.  Bringing Big Data to Personalized Healthcare: A Patient-Centered Framework , 2013, Journal of General Internal Medicine.

[17]  C E Lipscomb,et al.  Medical Subject Headings (MeSH). , 2000, Bulletin of the Medical Library Association.

[18]  Mirko Perkusich,et al.  A Hybrid Approach Using Case-Based Reasoning and Rule-Based Reasoning to Support Cancer Diagnosis: A Pilot Study , 2015, MedInfo.

[19]  R. Sharan,et al.  A method for inferring medical diagnoses from patient similarities , 2013, BMC Medicine.

[20]  Fei Wang,et al.  From micro to macro: data driven phenotyping by densification of longitudinal electronic medical records , 2014, KDD.

[21]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[22]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[23]  Kenney Ng,et al.  Personalized Predictive Modeling and Risk Factor Identification using Patient Similarity , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.