Evaluation of the quality of information retrieval of clinical findings from a computerized patient database using a semantic terminological model.

OBJECTIVES To measure the strength of agreement between the concepts and records retrieved from a computerized patient database, in response to physician-derived questions, using a semantic terminological model for clinical findings with those concepts and records excerpted clinically by manual identification. The performance of the semantic terminological model is also compared with the more established retrieval methods of free-text search, ICD-10, and hierarchic retrieval. DESIGN A clinical database (Diabeta) of 106,000 patient problem record entries containing 2,625 unique concepts in an clinical academic department was used to compare semantic, free-text, ICD-10, and hierarchic data retrieval against a gold standard in response to a battery of 47 clinical questions. MEASUREMENTS The performance of concept and record retrieval expressed as mean detection rate, positive predictive value, Yates corrected and Mantel-Haenszel chi-squared values, and Cohen kappa value, with significance estimated using the Mann-Whitney test. RESULTS The semantic terminological model used to retrieve clinically useful concepts from a patient database performed well and better than other methods, with a mean detection rate of 0.86, a positive predictive value of 0.96, a Yates corrected chi-squared value of 1,537, a Mantel-Haenszel chi-squared value of 19,302, and a Cohen kappa of 0.88. Results for record retrieval were even better, with a mean record detection rate of 0.94, a positive predictive value of 0.99, a Yates corrected chi-squared value of 94, 774, a Mantel-Haenszel chi-squared value of 1,550,356, and a Cohen kappa value of 0.94. The mean detection rate, Yates corrected chi-squared value, and Cohen kappa value for semantic retrieval were significantly better than for the other methods. CONCLUSION The use of a semantic terminological model in this test scenario provides an effective framework for representing clinical finding concepts and their relationships. Although currently incomplete, the model supports improved information retrieval from a patient database in response to clinically relevant questions, when compared with alternative methods of analysis.

[1]  Yves A. Lussier,et al.  Comparing SNOMED and ICPC retrieval accuracies using relational database models , 1997, AMIA.

[2]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[3]  Philip J. B. Brown,et al.  Evaluating the terminology requirements to support multi-disciplinary diabetes care , 1997, AMIA.

[4]  P Sönksen,et al.  Information technology in diabetes care 'Diabeta': 23 years of development and use of a computer-based record for diabetes care. , 1996, International journal of bio-medical computing.

[5]  S Shiffman,et al.  A free-text processing system to capture physical findings: Canonical Phrase Identification System (CAPIS). , 1991, Proceedings. Symposium on Computer Applications in Medical Care.

[6]  C Payne,et al.  Read Codes Version 3: A User Led Terminology , 1995, Methods of Information in Medicine.

[7]  A. Rector,et al.  Foundations for an Electronic Medical Record , 1991, Methods of Information in Medicine.

[8]  Clement J. McDonald,et al.  Development of the Logical Observation Identifier Names and Codes (LOINC) vocabulary. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[9]  Michael M. Wagner,et al.  Review: Accuracy of Data in Computer-based Patient Records , 1997, J. Am. Medical Informatics Assoc..

[10]  M O'Neil,et al.  Semantic definition of disorders in version 3 of the Read Codes. , 1998, Methods of information in medicine.

[11]  J. Cimino,et al.  Toward a Medical-concept Representation Language , 2022 .

[12]  C D Stuart-Buttle,et al.  The Read Thesaurus--creation and beyond. , 1997, Studies in health technology and informatics.

[13]  J. C. Klimczak SNOMED international, the systematized nomenclature of human and veterinary medicine , 1994 .

[14]  Philip J. B. Brown,et al.  A Standard for Evaluating the Retrieval Performance of Clinical Terminologies , 1999, AMIA.

[15]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[16]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[17]  D. Symmons,et al.  The clinical terms project. , 1992, British journal of rheumatology.

[18]  Alan L. Rector,et al.  Terminological systems: bridging the generation gap , 1997, AMIA.

[19]  Peter J. Haug,et al.  A natural language parsing system for encoding admitting diagnoses , 1997, AMIA.

[20]  R. Côté Systematized nomenclature of human and veterinary medicine : SNOMED international , 1993 .

[21]  M G Kahn,et al.  Supporting ad-hoc queries in an integrated clinical database. , 1995, Proceedings. Symposium on Computer Applications in Medical Care.

[22]  Peter Spyns Natural Language Processing in Medicine: An Overview , 1996, Methods of Information in Medicine.

[23]  Charles P. Friedman,et al.  Evaluation Methods in Medical Informatics , 1997, Computers and Medicine.

[24]  Philip J. B. Brown,et al.  Semantic based concept differential retrieval & equivalence detection in clinical terms version 3 (Read Codes) , 1999, AMIA.

[25]  D. Mant,et al.  Use of computerised general practice data for population surveillance: comparative study of influenza data. , 1991, BMJ.

[26]  Carol Friedman,et al.  Towards a comprehensive medical language processing system: methods and issues , 1997, AMIA.

[27]  R Buckland,et al.  The language of health. , 1993, BMJ.

[28]  E. B Schulz,et al.  Application of Technology: Symbolic Anatomic Knowledge Representation in the Read Codes Version 3: Structure and Application , 1997, J. Am. Medical Informatics Assoc..