Application of machine learning for hematological diagnosis

Quick and accurate medical diagnosis is crucial for the successful treatment of a disease. Using machine learning algorithms, we have built two models to predict a hematologic disease, based on laboratory blood test results. In one predictive model, we used all available blood test parameters and in the other a reduced set, which is usually measured upon patient admittance. Both models produced good results, with a prediction accuracy of 0.88 and 0.86, when considering the list of five most probable diseases, and 0.59 and 0.57, when considering only the most probable disease. Models did not differ significantly from each other, which indicates that a reduced set of parameters contains a relevant fingerprint of a disease, expanding the utility of the model for general practitioner's use and indicating that there is more information in the blood test results than physicians recognize. In the clinical test we showed that the accuracy of our predictive models was on a par with the ability of hematology specialists. Our study is the first to show that a machine learning predictive model based on blood tests alone, can be successfully applied to predict hematologic diseases and could open up unprecedented possibilities in medical diagnosis.

[1]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[2]  Edmund E Wilkes,et al.  Using machine learning to predict laboratory test results , 2016, Annals of clinical biochemistry.

[3]  Peter Szolovits,et al.  Using Machine Learning to Predict Laboratory Test Results. , 2016, American journal of clinical pathology.

[4]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[5]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[6]  Tony Badrick Evidence-based laboratory medicine. , 2013, The Clinical biochemist. Reviews.

[7]  Matjaz Kukar,et al.  Image processing and machine learning for fully automated probabilistic evaluation of medical images , 2011, Comput. Methods Programs Biomed..

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Michael A. Stephens,et al.  Results from the Relation Between Two Statistics of the Kolmogorov-Smirnov Type , 1969 .

[10]  Bram Ginneken,et al.  Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning , 2017 .

[11]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[12]  Roland Eils,et al.  Quantitative diagnosis of breast tumors by morphometric classification of microenvironmental myoepithelial cells using a machine learning approach , 2017, Scientific Reports.

[13]  Fan Yang,et al.  Using random forest for reliable classification and cost-sensitive learning for medical diagnosis , 2009, BMC Bioinformatics.

[14]  M. C. Peterson,et al.  Contributions of the history, physical examination, and laboratory investigation in making medical diagnoses. , 1992, The Western journal of medicine.

[15]  W. Benish Relative Entropy as a Measure of Diagnostic Information , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[16]  Armağan Kanca,et al.  Evaluation and Comparison of Diagnostic Test Performance Based on Information Theory , 2012 .

[17]  Aaron Trefler,et al.  The Future of Medical Diagnostics: Large Digitized Databases , 2012, The Yale journal of biology and medicine.

[18]  Brian R. Jackson,et al.  Primary Care Physicians' Challenges in Ordering Clinical Laboratory Tests and Interpreting Results , 2014, The Journal of the American Board of Family Medicine.

[19]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[20]  Akin Ozçift,et al.  Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis. , 2011, Computers in biology and medicine.

[21]  Igor Kononenko,et al.  Modern parameterization and explanation techniques in diagnostic decision support system: A case study in diagnostics of coronary artery disease , 2011, Artif. Intell. Medicine.

[22]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.