Discovering hidden knowledge through auditing clinical diagnostic knowledge bases

OBJECTIVE Evaluate potential for data mining auditing techniques to identify hidden concepts in diagnostic knowledge bases (KB). Improving completeness enhances KB applications such as differential diagnosis and patient case simulation. MATERIALS AND METHODS Authors used unsupervised (Pearson's correlation - PC, Kendall's correlation - KC, and a heuristic algorithm - HA) methods to identify existing and discover new finding-finding interrelationships ("properties") in the INTERNIST-1/QMR KB. Authors estimated KB maintenance efficiency gains (effort reduction) of the approaches. RESULTS The methods discovered new properties at 95% CI rates of [0.1%, 5.4%] (PC), [2.8%, 12.5%] (KC), and [5.6%, 18.8%] (HA). Estimated manual effort reduction for HA-assisted determination of new properties was approximately 50-fold. CONCLUSION Data mining can provide an efficient supplement to ensuring the completeness of finding-finding interdependencies in diagnostic knowledge bases. Authors' findings should be applicable to other diagnostic systems that record finding frequencies within diseases (e.g., DXplain, ISABEL).

[1]  R S LEDLEY,et al.  Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. , 1959, Science.

[2]  Sangeet Srivastava,et al.  A Hybrid Data Mining Model to Predict Coronary Artery Disease Cases Using Non-Invasive Clinical Data , 2016, Journal of Medical Systems.

[3]  Richard Scheines,et al.  The center for causal discovery of biomedical knowledge from big data , 2015, J. Am. Medical Informatics Assoc..

[4]  Roger Newson,et al.  Parameters behind “Nonparametric” Statistics: Kendall's tau, Somers’ D and Median Differences , 2002 .

[5]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[6]  H. E. Pople,et al.  Internist-I, an Experimental Computer-Based Diagnostic Consultant for General Internal Medicine , 1982 .

[7]  Perry L. Miller,et al.  Maintaining and Incrementally Revalidating a Computer-Based Clinical Guideline: A Case Study , 2001, J. Biomed. Informatics.

[8]  G. Barnett,et al.  DXplain. An evolving diagnostic decision-support system. , 1987, JAMA.

[9]  Gorry Ga,et al.  Computer-assisted clinical decision-making. , 1973 .

[10]  S. Brunak,et al.  Mining electronic health records: towards better research applications and clinical care , 2012, Nature Reviews Genetics.

[11]  H. Warner,et al.  A mathematical approach to medical diagnosis: application to congenital heart disease.1961. , 1992, M.D. computing : computers in medical practice.

[12]  Julie M. Fiskio,et al.  A process to maintain the quality of a computerized knowledge base , 1999, AMIA.

[13]  P Ramnarayan,et al.  ISABEL: a web-based differential diagnostic aid for paediatrics: results from an initial performance evaluation , 2003, Archives of disease in childhood.

[14]  D. Heckerman,et al.  Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base. II. Evaluation of diagnostic performance. , 1991, Methods of information in medicine.

[15]  Nigam H. Shah,et al.  Learning statistical models of phenotypes using noisy labeled training data , 2016, J. Am. Medical Informatics Assoc..

[16]  Randolph A. Miller,et al.  Review: Medical Diagnostic Decision Support Systems - Past, Present, And Future: A Threaded Bibliography and Brief Commentary , 1994, J. Am. Medical Informatics Assoc..

[17]  W. Kapoor,et al.  A computer-assisted medical diagnostic consultation service. Implementation and prospective evaluation of a prototype. , 1989, Annals of internal medicine.

[18]  R A Miller,et al.  INTERNIST-I properties: representing common sense and good medical practice in a computerized medical knowledge base. , 1985, Computers and biomedical research, an international journal.

[19]  Catherine Dehon,et al.  Influence functions of the Spearman and Kendall correlation measures , 2010, Stat. Methods Appl..

[20]  I. Sim Two Ways of Knowing: Big Data and Evidence-Based Medicine , 2016, Annals of Internal Medicine.

[21]  R A Miller,et al.  Medical knowledge bases , 1991, Academic medicine : journal of the Association of American Medical Colleges.

[22]  G. Gorry,et al.  Experience with a model of sequential diagnosis. , 2011, Computers and biomedical research, an international journal.

[23]  Yen S. Low,et al.  Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art , 2014, Drug Safety.

[24]  Marylyn D. Ritchie,et al.  PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations , 2010, Bioinform..

[25]  J. A. Harrison,et al.  A reformulation of the metal-electrolyte double layer problem , 1977 .

[26]  P H Harasym,et al.  Diagnostic reasoning strategies and diagnostic success , 2003, Medical education.

[27]  Sung-Hyuk Cha Comprehensive Survey on Distance/Similarity Measures between Probability Density Functions , 2007 .

[28]  Peter Szolovits,et al.  Categorical and Probabilistic Reasoning in Medicine Revisited , 1993, Artif. Intell..

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[31]  R L Blum,et al.  Discovery, confirmation, and incorporation of causal relationships from a large time-oriented clinical data base: the RX project. , 1982, Computers and biomedical research, an international journal.

[32]  Howard J. Hamilton,et al.  Interestingness measures for data mining: A survey , 2006, CSUR.

[33]  Fabian Stuby,et al.  Histologic analysis of ruptured quadriceps tendons , 2009, Knee Surgery, Sports Traumatology, Arthroscopy.

[34]  George Hripcsak,et al.  Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics , 2005, AMIA.

[35]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[36]  H. Warner,et al.  A mathematical approach to medical diagnosis. Application to congenital heart disease. , 1961, JAMA.

[37]  Feinstein Ar,et al.  An analysis of diagnostic reasoning. I. The domains and disorders of clinical macrobiology. , 1973 .

[38]  Randolph A. Miller,et al.  Using Causal Knowledge to Create Simulated Patient Cases: The CPCS Project as an Extension of INTERNIST-1 , 1988 .

[39]  Eric Horvitz,et al.  Decision theory in expert systems and artificial intelligenc , 1988, Int. J. Approx. Reason..

[40]  D. Heckerman,et al.  ,81. Introduction , 2022 .

[41]  Eric Horvitz,et al.  Heuristic Abstraction in the Decision-Theoretic Pathfinder System , 1989 .

[42]  Lawrence Hunter,et al.  Knowledge-based biomedical Data Science , 2017, Data Sci..