Evaluation of standard and semantically-augmented distance metrics for neurology patients

Background Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks. Methods We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient findings as machine learning features . We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics. Results Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric. Conclusion Although semantic augmentation reduced inter-patient distances, we did not find improved classification accuracy or improved cluster quality with semantically augmented patient distance metrics when applied to a dataset of neurology patients. Further work is needed to assess the utility of semantically augmented patient distances.

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  S. Banerjee,et al.  Targeted Next Generation Sequencing Revealed a Novel Homozygous Loss-of-Function Mutation in ILDR1 Gene Causes Autosomal Recessive Nonsyndromic Sensorineural Hearing Loss in a Chinese Family , 2019, Front. Genet..

[3]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[4]  Jiajie Peng,et al.  Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO , 2019, BMC Systems Biology.

[5]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[6]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[7]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[8]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[9]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[10]  E. Kelvin,et al.  Munro's statistical methods for health care research , 2013 .

[11]  Jeffrey P. Krischer,et al.  Research Paper: Variation of SNOMED CT Coding of Clinical Research Concepts among Coding Experts , 2007, J. Am. Medical Informatics Assoc..

[12]  Huilong Duan,et al.  Using the distance between sets of hierarchical taxonomic clinical concepts to measure patient similarity , 2019, BMC Medical Informatics and Decision Making.

[13]  Hoa A. Nguyen,et al.  A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[14]  M. Stamelou,et al.  Case Studies in Movement Disorders: Common and Uncommon Presentations , 2017 .

[15]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[16]  Po-Hsuan Cameron Chen,et al.  How to Read Articles That Use Machine Learning: Users' Guides to the Medical Literature. , 2019, JAMA.

[17]  Mark A. Musen,et al.  Comparison of Ontology-based Semantic-Similarity Measures , 2008, AMIA.

[18]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[19]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[20]  Ted Pedersen,et al.  Evaluating semantic similarity and relatedness over the semantic grouping of clinical term pairs , 2015, J. Biomed. Informatics.

[21]  Yadong Wang,et al.  Measuring phenotype semantic similarity using Human Phenotype Ontology , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[22]  Thusitha De Silva Mabotuwana,et al.  An ontology-based similarity measure for biomedical data - Application to radiology reports , 2013, J. Biomed. Informatics.

[23]  Deepali Jhamb,et al.  Multi-omics integration reveals molecular networks and regulators of psoriasis , 2019, BMC Systems Biology.

[24]  Sylvie Ratté,et al.  Comparison of MetaMap and cTAKES for entity extraction in clinical notes , 2018, BMC Medical Informatics and Decision Making.

[25]  C. Tappert,et al.  A Survey of Binary Similarity and Distance Measures , 2010 .

[26]  Kevin W. Boyack,et al.  Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches , 2011, PloS one.

[27]  Gayla R. Olbricht,et al.  Computational Learning Approaches to Data Analytics in Biomedical Applications , 2019 .

[28]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[29]  Abhishek Pandey,et al.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review , 2017, J. Biomed. Informatics.

[30]  Shraddha Pai,et al.  Patient Similarity Networks for Precision Medicine. , 2018, Journal of molecular biology.

[31]  Brett K. Beaulieu-Jones,et al.  Trends and Focus of Machine Learning Applications for Health Research , 2019, JAMA network open.

[32]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[33]  L. Caplan,et al.  The accuracy of bedside neurological diagnoses , 1990, Annals of neurology.

[34]  Martin Chodorow,et al.  Combining local context and wordnet similarity for word sense identification , 1998 .

[35]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[36]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[37]  E. Simpson,et al.  Case Files Neurology , 2007 .

[38]  Riccardo Bellazzi,et al.  Patient similarity for precision medicine: A systematic review , 2018, J. Biomed. Informatics.

[39]  Anis Sharafoddini,et al.  Patient Similarity in Prediction Models Based on Health Data: A Scoping Review , 2017, JMIR medical informatics.

[40]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[41]  Rui Xu,et al.  Clustering Algorithms in Biomedical Research: A Review , 2010, IEEE Reviews in Biomedical Engineering.

[42]  Daniel B. Hier,et al.  A Neuro-ontology for the neurological examination , 2020, BMC Medical Informatics and Decision Making.

[43]  George Hripcsak,et al.  Inter-patient distance metrics using SNOMED CT defining relationships , 2006, J. Biomed. Informatics.

[44]  Hal Blumenfeld,et al.  Neuroanatomy Through Clinical Cases 2nd Ed. , 2010 .

[45]  E. Frost Neurology. Image-based Clinical Review , 2016 .

[46]  Shiming Yang,et al.  Linking Big Data and Prediction Strategies: Tools, Pitfalls, and Lessons Learned , 2019, Critical care medicine.

[47]  James J. Cimino,et al.  Towards the development of a conceptual distance metric for the UMLS , 2004, J. Biomed. Informatics.

[48]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[49]  James J. Cimino,et al.  Reliability of SNOMED-CT Coding by Three Physicians using Two Terminology Browsers , 2006, AMIA.

[50]  Lena Wiese,et al.  Efficient In-Database Patient Similarity Analysis for Personalized Medical Decision Support Systems , 2018, Big Data Res..

[51]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[52]  Abdul Hanan Abdullah,et al.  A Critical Review for Developing Accurate and Dynamic Predictive Models Using Machine Learning Methods in Medicine and Health Care , 2017, Journal of Medical Systems.

[53]  Sb Bhattacharyya,et al.  Introduction to SNOMED CT , 2015 .

[54]  Lei Zhang,et al.  An Effective Method to Measure Disease Similarity Using Gene and Phenotype Associations , 2019, Front. Genet..

[55]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[56]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[57]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[58]  N. Jana,et al.  Current use of medical eponyms – a need for global uniformity in scientific publications , 2009, BMC medical research methodology.

[59]  Bin Wu,et al.  Muscle fatigue detection and treatment system driven by internet of things , 2019, BMC Medical Informatics and Decision Making.

[60]  Frank van Harmelen,et al.  Peer Selection in Peer-to-Peer Networks with Semantic Topologies , 2004, ICSNW.

[61]  Rafael Berlanga Llavori,et al.  In the pursuit of a semantic similarity metric based on UMLS annotations for articles in PubMed Central Open Access , 2015, J. Biomed. Informatics.

[62]  Zongyou Guo,et al.  Melanin Transfer in Human 3D Skin Equivalents Generated Exclusively from Induced Pluripotent Stem Cells , 2015, PloS one.

[63]  M. Bouzeghoub,et al.  Semantics of a Networked World. Semantics for Grid Databases , 2004, Lecture Notes in Computer Science.

[64]  Stephan Dreiseitl,et al.  Using concept hierarchies to improve calculation of patient similarity , 2016, J. Biomed. Informatics.

[65]  Lin Gao,et al.  HPOSim: An R Package for Phenotypic Similarity Measure and Enrichment Analysis Based on the Human Phenotype Ontology , 2015, PloS one.