A decision support system to follow up and diagnose primary headache patients using semantically enriched data

BackgroundHeadache disorders are an important health burden, having a large health-economic impact worldwide. Current treatment & follow-up processes are often archaic, creating opportunities for computer-aided and decision support systems to increase their efficiency. Existing systems are mostly completely data-driven, and the underlying models are a black-box, deteriorating interpretability and transparency, which are key factors in order to be deployed in a clinical setting.MethodsIn this paper, a decision support system is proposed, composed of three components: (i) a cross-platform mobile application to capture the required data from patients to formulate a diagnosis, (ii) an automated diagnosis support module that generates an interpretable decision tree, based on data semantically annotated with expert knowledge, in order to support physicians in formulating the correct diagnosis and (iii) a web application such that the physician can efficiently interpret captured data and learned insights by means of visualizations.ResultsWe show that decision tree induction techniques achieve competitive accuracy rates, compared to other black- and white-box techniques, on a publicly available dataset, referred to as migbase. Migbase contains aggregated information of headache attacks from 849 patients. Each sample is labeled with one of three possible primary headache disorders. We demonstrate that we are able to reduce the classification error, statistically significant (ρ≤0.05), with more than 10% by balancing the dataset using prior expert knowledge. Furthermore, we achieve high accuracy rates by using features extracted using the Weisfeiler-Lehman kernel, which is completely unsupervised. This makes it an ideal approach to solve a potential cold start problem.ConclusionDecision trees are the perfect candidate for the automated diagnosis support module. They achieve predictive performances competitive to other techniques on the migbase dataset and are, foremost, completely interpretable. Moreover, the incorporation of prior knowledge increases both predictive performance as well as transparency of the resulting predictive model on the studied dataset.

[1]  Jesus J. Caban,et al.  Visual analytics in healthcare - opportunities and research challenges , 2015, J. Am. Medical Informatics Assoc..

[2]  David Craft,et al.  The value of prior knowledge in machine learning of complex network systems , 2016, bioRxiv.

[3]  Inês Dutra,et al.  Leveraging Expert Knowledge to Improve Machine-Learned Decision Support Systems , 2015, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[4]  P. Cortelli,et al.  Underdiagnosis and Undertreatment of Migraine in Italy: A Survey of Patients Attending for The First Time 10 Headache Centres , 2009, Cephalalgia : an international journal of headache.

[5]  Heiko Paulheim,et al.  Semantic Web in data mining and knowledge discovery: A comprehensive survey , 2016, J. Web Semant..

[6]  M. Allena,et al.  Diagnostic and therapeutic errors in cluster headache: a hospital-based study , 2014, The Journal of Headache and Pain.

[7]  H. Diener,et al.  European principles of management of common headache disorders in primary care. , 2007, The journal of headache and pain.

[8]  Dragan Simic,et al.  Migraine Diagnosis Support System Based on Classifier Ensemble , 2014, ICT Innovations.

[9]  Ashutosh Kumar Singh,et al.  Global, regional, and national incidence, prevalence, and years lived with disability for 310 diseases and injuries, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015 , 2016, Lancet.

[10]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[11]  J. Pareja,et al.  The Usual Treatment of Trigeminal Autonomic Cephalalgias , 2013, Headache.

[12]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[13]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[14]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[15]  M. Jahanshahi,et al.  Quality of life in primary headache disorders: A review , 2016, Cephalalgia : an international journal of headache.

[16]  J. Saper,et al.  Headache disorders. , 1999, The Medical clinics of North America.

[17]  Stephan Bloehdorn,et al.  Graph Kernels for RDF Data , 2012, ESWC.

[18]  J. Os,et al.  Cost of disorders of the brain in Europe 2010 , 2011, European Neuropsychopharmacology.

[19]  Klaus-Robert Müller,et al.  Feature Importance Measure for Non-linear Learning Algorithms , 2016, ArXiv.

[20]  Jens Lehmann,et al.  Distributed Semantic Analytics Using the SANSA Stack , 2017, SEMWEB.

[21]  T. Smitherman Diagnosis and Clinical Evaluation , 2016 .

[22]  Ufuk Çelik,et al.  Diagnostic Accuracy Comparison of Artificial Immune Algorithms for Primary Headaches , 2015, Comput. Math. Methods Medicine.

[23]  P. Goadsby,et al.  Migraine misdiagnosis as a sinusitis, a delay that can last for many years , 2013, The Journal of Headache and Pain.

[24]  Huilong Duan,et al.  A Guideline-based Decision Support System for Headache Diagnosis , 2013, MedInfo.

[25]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[26]  Evgeniy Gabrilovich,et al.  A Review of Relational Machine Learning for Knowledge Graphs , 2015, Proceedings of the IEEE.

[27]  M. Allena,et al.  An electronic diary on a palm device for headache monitoring: a preliminary experience , 2012, The Journal of Headache and Pain.

[28]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[29]  Gerben de Vries A Fast Approximation of the Weisfeiler-Lehman Graph Kernel for RDF Data , 2013, ECML/PKDD.

[30]  Nilufer Yurtay,et al.  An ant colony optimization algorithm-based classification for the diagnosis of primary headaches using a website questionnaire expert system , 2017 .

[31]  Filip De Turck,et al.  A Genetic Algorithm for Interpretable Model Extraction from Decision Tree Ensembles , 2017, PAKDD.

[32]  T. Steiner,et al.  Lifting the burden: the global campaign against headache , 2004, The Lancet Neurology.

[33]  Shannon J. Lane,et al.  Bmc Medical Informatics and Decision Making a Review of Randomized Controlled Trials Comparing the Effectiveness of Hand Held Computers with Paper Methods for Data Collection , 2006 .

[34]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[35]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[36]  V. Martin The Diagnostic Evaluation of Secondary Headache Disorders , 2011, Headache.

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  Amos S Hundert,et al.  Commercially Available Mobile Phone Headache Diary Apps: A Systematic Review , 2014, JMIR mHealth and uHealth.

[39]  Bartosz Krawczyk,et al.  Automatic diagnosis of primary headaches by machine learning methods , 2013 .

[40]  Seetha Hari,et al.  Learning From Imbalanced Data , 2019, Advances in Computer and Electrical Engineering.

[41]  Patrick Blake,et al.  Clinical decision support systems for improving diagnostic accuracy and achieving precision medicine , 2015, Journal of Clinical Bioinformatics.

[42]  Martin Mozina,et al.  Orange: data mining toolbox in python , 2013, J. Mach. Learn. Res..

[43]  M. De Hert,et al.  Cost of disorders of the brain in Europe. , 2006, European journal of neurology.

[44]  Begonya Garcia-Zapirain,et al.  Automatic migraine classification via feature selection committee and machine learning techniques over imaging and questionnaire data , 2017, BMC Medical Informatics and Decision Making.

[45]  J. Pascual,et al.  Epidemiology of headache in Europe , 2006, European journal of neurology.

[46]  Gilles Vandewiele,et al.  Enhancing White-Box Machine Learning Processes by Incorporating Semantic Background Knowledge , 2017, ESWC.

[47]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[48]  A. Felício,et al.  Epidemiology of primary and secondary headaches in a Brazilian tertiary-care center. , 2006, Arquivos de neuro-psiquiatria.

[49]  R. Ohrbach,et al.  The International Classification of Headache Disorders, 3rd edition (beta version) , 2013, Cephalalgia : an international journal of headache.

[50]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[51]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[52]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[53]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[54]  Huilong Duan,et al.  A clinical decision support system for primary headache disorder based on hybrid intelligent reasoning , 2014, 2014 7th International Conference on Biomedical Engineering and Informatics.

[55]  Dhiya Al-Jumeily,et al.  An Intelligent Systems Approach to Primary Headache Diagnosis , 2017, ICIC.

[56]  L. Crevits,et al.  Diagnostic and therapeutic trajectory of cluster headache patients in Flanders. , 2009, Acta neurologica Belgica.

[57]  J. Olesen,et al.  Premonitory symptoms in migraine , 2003, Neurology.

[58]  Allan Hanbury,et al.  Machine learning framework incorporating expert knowledge in tissue image annotation , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[59]  Dan J Stein,et al.  Global, regional, and national disability-adjusted life-years (DALYs) for 333 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2016: a systematic analysis for the Global Burden of Disease Study 2016 , 2017, Lancet.

[60]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[61]  Hao Wang,et al.  Semantic data mining: A survey of ontology-based approaches , 2015, Proceedings of the 2015 IEEE 9th International Conference on Semantic Computing (IEEE ICSC 2015).

[62]  M. Ferrari,et al.  Features involved in the diagnostic delay of cluster headache , 2003, Journal of neurology, neurosurgery, and psychiatry.

[63]  Donato Malerba,et al.  A Comparative Analysis of Methods for Pruning Decision Trees , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[64]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[65]  Dan J Stein,et al.  Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013 , 2015, The Lancet.

[66]  M. Russell Genetics of tension-type headache , 2007, The Journal of Headache and Pain.

[67]  J. Remon,et al.  Self‐medication of regular headache: a community pharmacy‐based survey , 2012, European journal of neurology.

[68]  J. Olesen,et al.  Diaries and Calendars for Migraine. A Review , 2006, Cephalalgia : an international journal of headache.

[69]  Anitha Kannan,et al.  Development and Evaluation of an iPad App for Measuring the Cost of a Nutritious Diet , 2014, JMIR mHealth and uHealth.