Decision Support Methods for Finding Phenotype — Disorder Associations in the Bone Dysplasia Domain

A lack of mature domain knowledge and well established guidelines makes the medical diagnosis of skeletal dysplasias (a group of rare genetic disorders) a very complex process. Machine learning techniques can facilitate objective interpretation of medical observations for the purposes of decision support. However, building decision support models using such techniques is highly problematic in the context of rare genetic disorders, because it depends on access to mature domain knowledge. This paper describes an approach for developing a decision support model in medical domains that are underpinned by relatively sparse knowledge bases. We propose a solution that combines association rule mining with the Dempster-Shafer theory (DST) to compute probabilistic associations between sets of clinical features and disorders, which can then serve as support for medical decision making (e.g., diagnosis). We show, via experimental results, that our approach is able to provide meaningful outcomes even on small datasets with sparse distributions, in addition to outperforming other Machine Learning techniques and behaving slightly better than an initial diagnosis by a clinician.

[1]  H Pandza,et al.  [Medical expert systems]. , 1995, Medicinski arhiv.

[2]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[3]  Gholamreza Haffari,et al.  Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data , 2011, Bioinform..

[4]  Kamal Premaratne,et al.  Predicting Missing Items in Shopping Carts , 2009, IEEE Transactions on Knowledge and Data Engineering.

[5]  Sergio A. Alvarez,et al.  Machine learning of clinical performance in a pancreatic cancer database , 2010, Artif. Intell. Medicine.

[6]  Mobyen Uddin Ahmed,et al.  Case-Based Reasoning Systems in the Health Sciences: A Survey of Recent Trends and Developments , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Isabelle Bloch,et al.  Some aspects of Dempster-Shafer evidence theory for classification of multi-modality medical images taking partial volume effect into account , 1996, Pattern Recognit. Lett..

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Mark A. Musen,et al.  The Open Biomedical Annotator , 2009, Summit on translational bioinformatics.

[10]  Tetsuya Murai,et al.  Association Rules and Dempster-Shafer Theory of Evidence , 2003, Discovery Science.

[11]  Ludmil Mikhailov,et al.  An interpretable fuzzy rule-based classification methodology for medical diagnosis , 2009, Artif. Intell. Medicine.

[12]  Mei-Ling Shyu,et al.  Rule Mining and Classification in a Situation Assessment Application: A Belief-Theoretic Approach for Handling Data Imperfections , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  Chris Mungall,et al.  Phenotype ontologies: the bridge between genomics and evolution. , 2007, Trends in ecology & evolution.

[14]  Dunja Mladenic,et al.  Data Sparsity Issues in the Collaborative Filtering Framework , 2005, WEBKDD.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Karin M. Verspoor,et al.  A UIMA wrapper for the NCBO annotator , 2010, Bioinform..

[17]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[20]  Thomas J. Watson,et al.  An empirical study of the naive Bayes classifier , 2001 .

[21]  Shuigeng Zhou,et al.  A pattern-based nearest neighbor search approach for promoter prediction using DNA structural profiles , 2009, Bioinform..

[22]  Carlos J. Perez,et al.  Bayesian robustness for decision making problems: Applications in medical contexts , 2009, Int. J. Approx. Reason..

[23]  Ewa Straszecka,et al.  Combining uncertainty and imprecision in models of medical diagnosis , 2006, Inf. Sci..

[24]  Bruce G. Buchanan,et al.  The MYCIN Experiments of the Stanford Heuristic Programming Project , 1985 .

[25]  Alípio Mário Jorge,et al.  Comparing Rule Measures for Predictive Association Rules , 2007, ECML.

[26]  Gholam Ali Montazer,et al.  A fuzzy-evidential hybrid inference engine for coronary heart disease risk assessment , 2010, Expert Syst. Appl..

[27]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[28]  Edward H. Shortliffe,et al.  The Dempster-Shafer theory of evidence , 1990 .

[29]  Edward H. Shortliffe,et al.  Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project (The Addison-Wesley series in artificial intelligence) , 1984 .

[31]  P. Robinson,et al.  The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. , 2008, American journal of human genetics.

[32]  Ronald R. Yager,et al.  Classic Works of the Dempster-Shafer Theory of Belief Functions , 2010, Classic Works of the Dempster-Shafer Theory of Belief Functions.

[33]  Hung T. Nguyen,et al.  Diagnosis of hypoglycemic episodes using a neural network based rule discovery system , 2011, Expert Syst. Appl..

[34]  Marcel H. Schulz,et al.  Clinical diagnostics in human genetics with semantic similarity searches in ontologies. , 2009, American journal of human genetics.

[35]  Ivica Kopriva,et al.  A mixture model with a reference-based automatic selection of components for disease classification from protein and/or gene expression levels , 2011, BMC Bioinformatics.

[36]  Arthur P. Dempster,et al.  Upper and Lower Probabilities Induced by a Multivalued Mapping , 1967, Classic Works of the Dempster-Shafer Theory of Belief Functions.