A SNOMED supported ontological vector model for subclinical disorder detection using EHR similarity

Electronic Health Records (EHR) form a valuable resource in the healthcare enterprise because clinical evidence can be provided to identify potential complications and support decisions on early intervention. Simple string matching, the common search algorithm, is not able to map a query to the similar health records in the database with respect to the medical concepts. A novel ontological vector model supported by the Systematized Nomenclature of Medicine Clinical Terms (SNOMED-CT) is proposed in this paper to project the disease terms of a health record to a feature space so that each health record can be characterized using a feature vector, giving a fingerprint of the record. The similarity between the query and database health records was measured by similarity measures of their feature vectors and string matching score respectively. Three types of similarity measures were considered in this study, namely, Euclidean distance (ED), direction cosine (DC) and modified direction cosine (mDC). Medical history and carotid ultrasonic imaging findings were collected from 47 subjects in Hong Kong. The dataset formed 1081 pairs of health records and ROC analysis was used to evaluate and compare the accuracy of the ontological vector model and simple string matching against the agreement of the presence or absence of carotid plaques identified by carotid ultrasound between two subjects. It was found that the score generated by simple string matching was a random rater but the ontological vector model was not. In other words, the degree of health record similarity based on the ontological vector model is associated with the agreement of atherosclerosis between two patients. The vector model using feature terms at the SNOMED-CT level 4 gave the best performance. The performance of mDC was very close to that of ED and DC but the properties of mDC make it more suitable for the retrieval of similar health records. It was also shown that the ontological vector model was enhanced by the support vector classifier approach.

[1]  H Ahlfeldt,et al.  Evaluation of Three Swedish ICD-10 Primary Care Versions: Reliability and Ease of Use in Diagnostic Coding , 2000, Methods of Information in Medicine.

[2]  Werner Ceusters,et al.  Strategies for referent tracking in electronic health records , 2006, J. Biomed. Informatics.

[3]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[4]  Kaija Saranto,et al.  Definition, structure, content, use and impacts of electronic health records: A review of the research literature , 2008, Int. J. Medical Informatics.

[5]  Ted Pedersen,et al.  Measures of semantic similarity and relatedness in the biomedical domain , 2007, J. Biomed. Informatics.

[6]  Janet M. Corrigan,et al.  Key Capabilities of an Electronic Health Record System: Letter Report , 2004 .

[7]  T. Mexia,et al.  Author ' s personal copy , 2009 .

[8]  G. Grisetti,et al.  Further Reading , 1984, IEEE Spectrum.

[9]  Gary L Myers,et al.  Markers of inflammation and cardiovascular disease: application to clinical and public health practice: A statement for healthcare professionals from the Centers for Disease Control and Prevention and the American Heart Association. , 2003, Circulation.

[10]  Michael Schroeder,et al.  GoPubMed: exploring PubMed with the Gene Ontology , 2005, Nucleic Acids Res..

[11]  David A. Hanauer,et al.  Exploring Clinical Associations Using ‘-Omics’ Based Enrichment Analyses , 2009, PloS one.

[12]  Euripides G. M. Petrakis,et al.  Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies , 1998 .

[13]  Mobyen Uddin Ahmed,et al.  A CASE‐BASED DECISION SUPPORT SYSTEM FOR INDIVIDUAL STRESS DIAGNOSIS USING FUZZY SIMILARITY MATCHING , 2009, Comput. Intell..

[14]  Chris J. Harris,et al.  On the modelling of nonlinear dynamic systems using support vector neural networks , 2001 .

[15]  Jerome Wang,et al.  An Applied Evaluation of SNOMED CT as a Clinical Vocabulary for the Computerized Diagnosis and Problem List , 2003, AMIA.

[16]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[17]  D. Mozaffarian,et al.  Heart disease and stroke statistics--2009 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. , 2009, Circulation.

[18]  G Salton,et al.  Developments in Automatic Text Retrieval , 1991, Science.

[19]  Sidney C. Smith,et al.  MARKERS OF INFLAMMATION AND CARDIOVASCULAR DISEASE: APPLICATION TO CLINICAL AND PUBLIC HEALTH PRACTICE: A STATEMENT FOR HEALTHCARE PROFESSIONALS FROM THE CENTERS FOR DISEASE CONTROL AND PREVENTION AND THE AMERICAN HEART ASSOCIATION , 2003 .

[20]  Khaled Mellouli,et al.  A New Similarity Measure Based On Edge Counting , 2008 .

[21]  J. Wenny Rahayu,et al.  Ontology driven semantic profiling and retrieval in medical information systems , 2009, J. Web Semant..

[22]  Electronic health records , 2005, BMJ.

[23]  Kent A. Spackman,et al.  The Use of SNOMED© CT Simplifies Querying of a Clinical Data Warehouse , 2003, AMIA.

[24]  J. Pankow,et al.  Association of C-reactive protein with markers of prevalent atherosclerotic disease. , 2001, The American journal of cardiology.

[25]  Troels Andreasen,et al.  Perspectives on ontology-based querying: Research Articles , 2007 .

[26]  George Hripcsak,et al.  A statistical methodology for analyzing co-occurrence data from a large sample , 2007, J. Biomed. Informatics.

[27]  A. Majeed,et al.  Use of Read codes in diabetes management in a south London primary care group: implications for establishing disease registers , 2003, BMJ : British Medical Journal.

[28]  Barry Robson,et al.  Data mining and clinical data repositories: Insights from a 667, 000 patient data set , 2006, Comput. Biol. Medicine.

[29]  Troels Andreasen,et al.  Perspectives on ontology‐based querying , 2007, Int. J. Intell. Syst..

[30]  George Hripcsak,et al.  Inter-patient distance metrics using SNOMED CT defining relationships , 2006, J. Biomed. Informatics.

[31]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[32]  David A. Hanauer,et al.  EMERSE: The Electronic Medical Record Search Engine , 2006, AMIA.

[33]  Shamik Sural,et al.  Similarity between Euclidean and cosine angle distance for nearest neighbor queries , 2004, SAC '04.

[34]  Carole A. Goble,et al.  Investigating Semantic Similarity Measures Across the Gene Ontology: The Relationship Between Sequence and Annotation , 2003, Bioinform..

[35]  Barry Smith,et al.  From concepts to clinical reality: An essay on the benchmarking of biomedical terminologies , 2006, J. Biomed. Informatics.

[36]  Vicky Fung,et al.  Principles-Based Medical Informatics for Success - How Hong Kong Built One of the World's Largest Integrated Longitudinal Electronic Patient Records , 2007, MedInfo.

[37]  Jian-Bo Yang,et al.  Clinical Decision Support Systems: A Review on Knowledge Representation and Inference Under Uncertainties , 2008, Int. J. Comput. Intell. Syst..

[38]  Ruediger C. Braun-Dullaeus,et al.  Vascular proliferation and atherosclerosis: New perspectives and therapeutic strategies , 2002, Nature Medicine.

[39]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[40]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[41]  Chi-Ren Shyu,et al.  Efficient SCOP-fold classification and retrieval using index-based protein substructure alignments , 2009, Bioinform..

[42]  G Salton,et al.  Global Text Matching for Information Retrieval , 1991, Science.

[43]  Gustav Mikkelsen,et al.  Manual semantic tagging to improve access to information in narrative electronic medical records , 2002, Int. J. Medical Informatics.

[44]  G. Nilsson,et al.  Textual content, health problems and diagnostic codes in electronic patient records in general practice , 2003, Scandinavian journal of primary health care.

[45]  Amar K. Das,et al.  Local alignment tool for clinical history: temporal semantic search of clinical databases. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[46]  Michael K. Ng,et al.  A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clustering , 2007, DASFAA.

[47]  Pablo Castells,et al.  An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval , 2007, IEEE Transactions on Knowledge and Data Engineering.

[48]  Stefan Schulz,et al.  Evaluation of a Document Search Engine in a Clinical Department System , 2008, AMIA.