Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction

BackgroundPredictive modeling is a key component of solutions to many healthcare problems. Among all predictive modeling approaches, machine learning methods often achieve the highest prediction accuracy, but suffer from a long-standing open problem precluding their widespread use in healthcare. Most machine learning models give no explanation for their prediction results, whereas interpretability is essential for a predictive model to be adopted in typical healthcare settings.MethodsThis paper presents the first complete method for automatically explaining results for any machine learning predictive model without degrading accuracy. We did a computer coding implementation of the method. Using the electronic medical record data set from the Practice Fusion diabetes classification competition containing patient records from all 50 states in the United States, we demonstrated the method on predicting type 2 diabetes diagnosis within the next year.ResultsFor the champion machine learning model of the competition, our method explained prediction results for 87.4 % of patients who were correctly predicted by the model to have type 2 diabetes diagnosis within the next year.ConclusionsOur demonstration showed the feasibility of automatically explaining results for any machine learning predictive model without degrading accuracy.

[1]  Gang Luo,et al.  Using Computational Approaches to Improve Risk-Stratified Patient Management: Rationale and Methods , 2015, JMIR research protocols.

[2]  Ping Zhang,et al.  Lifetime direct medical costs of treating type 2 diabetes and diabetic complications. , 2013, American journal of preventive medicine.

[3]  B. Howard,et al.  Effects of Diet and Exercise in Preventing NIDDM in People With Impaired Glucose Tolerance: The Da Qing IGT and Diabetes Study , 1997, Diabetes Care.

[4]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Hideo Ayame,et al.  [The Da Qing IGT and Diabetes study]. , 2005, Nihon rinsho. Japanese journal of clinical medicine.

[6]  Hao Yang,et al.  MedSearch: a specialized search engine for medical information retrieval , 2008, CIKM '08.

[7]  Fadi A. Thabtah,et al.  A review of associative classification mining , 2007, The Knowledge Engineering Review.

[8]  Dang Qing,et al.  Effects of Diet and Exercise in Preventing NIDDM in People With Impaired Glucose Tolerance The , 2022 .

[9]  N J Wareham,et al.  Do simple questions about diet and physical activity help to identify those at risk of Type 2 diabetes? , 2007, Diabetic medicine : a journal of the British Diabetic Association.

[10]  M. Laakso,et al.  Acarbose for prevention of type 2 diabetes mellitus: the STOP-NIDDM randomised trial , 2002, The Lancet.

[11]  Randy C. Axelrod,et al.  Predictive Modeling in Health Plans , 2003 .

[12]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[13]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[14]  Craig MacDonald,et al.  Search Result Diversification , 2015, Found. Trends Inf. Retr..

[15]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[16]  S. Haffner,et al.  Identification of Persons at High Risk for Type 2 Diabetes Mellitus: Do We Need the Oral Glucose Tolerance Test? , 2002, Annals of Internal Medicine.

[17]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[18]  C.J.H. Mann,et al.  Clinical Prediction Models: A Practical Approach to Development, Validation and Updating , 2009 .

[19]  David M. Eddy,et al.  Diabetes Risk Calculator , 2008, Diabetes Care.

[20]  Jaakko Tuomilehto,et al.  The diabetes risk score: a practical tool to predict type 2 diabetes risk. , 2003, Diabetes care.

[21]  S. Fowler,et al.  Reduction in the incidence of type 2 diabetes with lifestyle intervention or metformin. , 2002 .

[22]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[23]  H. Asadi,et al.  Machine Learning for Outcome Prediction of Acute Ischemic Stroke Post Intra-Arterial Therapy , 2014, PloS one.

[24]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[25]  Senlin Luo,et al.  Rule Extraction From Support Vector Machines Using Ensemble Learning Approach: An Application for Diagnosis of Diabetes , 2015, IEEE Journal of Biomedical and Health Informatics.

[26]  P. H. Sönksen,et al.  Data mining for indicators of early mortality in a database of clinical records , 2001, Artif. Intell. Medicine.

[27]  M J Pazzani,et al.  Acceptance of Rules Generated by Machine Learning among Medical Experts , 2001, Methods of Information in Medicine.

[28]  T. Valle,et al.  Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. , 2001, The New England journal of medicine.

[29]  Chunqiang Tang,et al.  Automatic Home Medical Product Recommendation , 2012, Journal of Medical Systems.