An application of machine learning to haematological diagnosis

Quick and accurate medical diagnoses are crucial for the successful treatment of diseases. Using machine learning algorithms and based on laboratory blood test results, we have built two models to predict a haematologic disease. One predictive model used all the available blood test parameters and the other used only a reduced set that is usually measured upon patient admittance. Both models produced good results, obtaining prediction accuracies of 0.88 and 0.86 when considering the list of five most likely diseases and 0.59 and 0.57 when considering only the most likely disease. The models did not differ significantly, which indicates that a reduced set of parameters can represent a relevant “fingerprint” of a disease. This knowledge expands the model’s utility for use by general practitioners and indicates that blood test results contain more information than physicians generally recognize. A clinical test showed that the accuracy of our predictive models was on par with that of haematology specialists. Our study is the first to show that a machine learning predictive model based on blood tests alone can be successfully applied to predict haematologic diseases. This result and could open up unprecedented possibilities for medical diagnosis.

[1]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[2]  Matjaz Kukar,et al.  Image processing and machine learning for fully automated probabilistic evaluation of medical images , 2011, Comput. Methods Programs Biomed..

[3]  Bram van Ginneken,et al.  Fifty years of computer analysis in chest imaging: rule-based, machine learning, deep learning , 2017, Radiological Physics and Technology.

[4]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[5]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[6]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[7]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[8]  Roland Eils,et al.  Quantitative diagnosis of breast tumors by morphometric classification of microenvironmental myoepithelial cells using a machine learning approach , 2017, Scientific Reports.

[9]  Edmund E Wilkes,et al.  Using machine learning to predict laboratory test results , 2016, Annals of clinical biochemistry.

[10]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[11]  Aaron Trefler,et al.  The Future of Medical Diagnostics: Large Digitized Databases , 2012, The Yale journal of biology and medicine.

[12]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[13]  Michael I. Jordan,et al.  Machine learning: Trends, perspectives, and prospects , 2015, Science.

[14]  Brian R. Jackson,et al.  Primary Care Physicians' Challenges in Ordering Clinical Laboratory Tests and Interpreting Results , 2014, The Journal of the American Board of Family Medicine.

[15]  M. C. Peterson,et al.  Contributions of the history, physical examination, and laboratory investigation in making medical diagnoses. , 1992, The Western journal of medicine.

[16]  W. Benish Relative Entropy as a Measure of Diagnostic Information , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[17]  Derek Greene,et al.  Ensemble clustering in medical diagnostics , 2004, Proceedings. 17th IEEE Symposium on Computer-Based Medical Systems.

[18]  Anne-Laure Boulesteix,et al.  Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics , 2012, WIREs Data Mining Knowl. Discov..

[19]  L Thomas,et al.  Evidence-based laboratory medicine. , 2001, Clinical laboratory.

[20]  Marleen de Bruijne,et al.  Machine learning approaches in medical image analysis: From detection to diagnosis , 2016, Medical Image Anal..

[21]  Gilles R. Ducharme,et al.  Computational Statistics and Data Analysis a Similarity Measure to Assess the Stability of Classification Trees , 2022 .

[22]  Fan Yang,et al.  Using random forest for reliable classification and cost-sensitive learning for medical diagnosis , 2009, BMC Bioinformatics.

[23]  S. van Buuren Multiple imputation of discrete and continuous data by fully conditional specification , 2007, Statistical methods in medical research.

[24]  Senén Barro,et al.  Do we need hundreds of classifiers to solve real world classification problems? , 2014, J. Mach. Learn. Res..

[25]  Akin Özçift,et al.  Random forests ensemble classifier trained with data resampling strategy to improve cardiac arrhythmia diagnosis , 2011, Comput. Biol. Medicine.

[26]  Igor Kononenko,et al.  Modern parameterization and explanation techniques in diagnostic decision support system: A case study in diagnostics of coronary artery disease , 2011, Artif. Intell. Medicine.

[27]  Sebastian Thrun,et al.  Dermatologist-level classification of skin cancer with deep neural networks , 2017, Nature.

[28]  Armağan Kanca,et al.  Evaluation and Comparison of Diagnostic Test Performance Based on Information Theory , 2012 .

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[31]  OpitzDavid,et al.  Popular ensemble methods , 1999 .

[32]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[33]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).