Machine learning with asymmetric abstention for biomedical decision-making

Machine learning and artificial intelligence have entered biomedical decision-making for diagnostics, prognostics, or therapy recommendations. However, these methods need to be interpreted with care because of the severe consequences for patients. In contrast to human decision-making, computational models typically make a decision also with low confidence. Machine learning with abstention better reflects human decision-making by introducing a reject option for samples with low confidence. The abstention intervals are typically symmetric intervals around the decision boundary. In the current study, we use asymmetric abstention intervals, which we demonstrate to be better suited for biomedical data that is typically highly imbalanced. We evaluate symmetric and asymmetric abstention on three real-world biomedical datasets and show that both approaches can significantly improve classification performance. However, asymmetric abstention rejects as many or fewer samples compared to symmetric abstention and thus, should be used in imbalanced data.

[1]  Dominik Heider,et al.  GUESS: projecting machine learning scores to well-calibrated probability estimates for clinical decision-making , 2018, Bioinform..

[2]  Thomas Lengauer,et al.  Innovations: Bioinformatics-assisted anti-HIV therapy , 2006, Nature Reviews Microbiology.

[3]  D. Chicco,et al.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation , 2020, BMC Genomics.

[4]  Federico Cabitza,et al.  The three-way-in and three-way-out framework to treat and exploit ambiguity in data , 2020, Int. J. Approx. Reason..

[5]  Ola Spjuth,et al.  Predicting with confidence: Using conformal prediction in drug discovery. , 2020, Journal of pharmaceutical sciences.

[6]  Eyke Hüllermeier,et al.  Reliable Multi-class Classification based on Pairwise Epistemic and Aleatoric Uncertainty , 2018, IJCAI.

[7]  A. Tsybakov,et al.  Optimal aggregation of classifiers in statistical learning , 2003 .

[8]  D. Ayres-de- Campos,et al.  SisPorto 2.0: a program for automated analysis of cardiotocograms. , 2000, The Journal of maternal-fetal medicine.

[9]  Johannes A Landsheer,et al.  The Clinical Relevance of Methods for Handling Inconclusive Medical Test Results: Quantification of Uncertainty in Medical Decision-Making and Screening , 2018, Diagnostics.

[10]  William Stafford Noble,et al.  Machine learning applications in genetics and genomics , 2015, Nature Reviews Genetics.

[11]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[12]  Francesco Tortorella,et al.  Reducing the classification cost of support vector classifiers through an ROC-based reject rule , 2004, Pattern Analysis and Applications.

[13]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Francesco Tortorella An Optimal Reject Rule for Binary Classifiers , 2000, SSPR/SPR.

[15]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[16]  Daniel E. Ho,et al.  How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals , 2021, Nature Medicine.

[17]  Dominik Heider,et al.  Encodings and models for antimicrobial peptide classification for multi-resistant pathogens , 2019, BioData Mining.

[18]  C. K. Chow,et al.  An optimum character recognition system using decision functions , 1957, IRE Trans. Electron. Comput..

[19]  Dominik Heider,et al.  Compensation of feature selection biases accompanied with improved predictive performance for binary classification by using a novel ensemble feature selection approach , 2016, BioData Mining.

[20]  J. Bernardes,et al.  SisPorto 2.0: A Program for Automated Analysis of Cardiotocograms , 2000 .

[21]  George Lee,et al.  Image analysis and machine learning in digital pathology: Challenges and opportunities , 2016, Medical Image Anal..

[22]  Stanley Lemeshow,et al.  Applied Logistic Regression, Second Edition , 1989 .

[23]  N. Razavian,et al.  Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning , 2018, Nature Medicine.

[24]  Fabio Roli,et al.  Reject option with multiple thresholds , 2000, Pattern Recognit..

[25]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[26]  Peihua Chen,et al.  Diabetes classification model based on boosting algorithms , 2018, BMC Bioinformatics.

[27]  Eyke Hüllermeier,et al.  Efficient set-valued prediction in multi-class classification , 2019, Data Mining and Knowledge Discovery.

[28]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[29]  Dominik Heider,et al.  The Virtual Doctor: An Interactive Artificial Intelligence based on Deep Learning for Non-Invasive Prediction of Diabetes , 2019, Artif. Intell. Medicine.

[30]  RiemenschneiderMona,et al.  Data Science for Molecular Diagnostics Applications: From Academia to Clinic to Industry , 2018 .

[31]  Radu Herbei,et al.  Classification with reject option , 2006 .

[32]  D. Heider,et al.  Fostering reproducibility, reusability, and technology transfer in health informatics , 2021, iScience.

[33]  Regina Barzilay,et al.  Using machine learning to parse breast pathology reports , 2016, bioRxiv.

[34]  Peter Bühlmann,et al.  MissForest - non-parametric missing value imputation for mixed-type data , 2011, Bioinform..

[35]  D. Heider,et al.  Data Science for Molecular Diagnostics Applications: From Academia to Clinic to Industry , 2018, Systems Medicine.

[36]  Selen Bozkurt,et al.  MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care , 2020, J. Am. Medical Informatics Assoc..

[37]  Dominik Heider,et al.  A simple structure-based model for the prediction of HIV-1 co-receptor tropism , 2014, BioData Mining.

[38]  A. Burgun,et al.  Big Data and machine learning in radiation oncology: State of the art and future prospects. , 2016, Cancer letters.