Explainable multimodal machine learning model for classifying pregnancy drug safety

MOTIVATION Teratogenic drugs can cause severe fetal malformation and therefore have critical impact on the health of the fetus, yet the teratogenic risks are unknown for most approved drugs. This paper proposes an explainable machine learning model for classifying pregnancy drug safety based on multimodal data and suggests an orthogonal ensemble for modeling multimodal data. To train the proposed model, we created a set of labeled drugs by processing over 100,000 textual responses collected by a large teratology information service. Structured textual information is incorporated into the model by applying clustering analysis to textual features. RESULTS We report an area under the receiver operating characteristic curve (AUC) of 0.891 using cross-validation and an AUC of 0.904 for cross-expert validation. Our findings suggest the safety of two drugs during pregnancy, Varenicline and Mebeverine, and suggest that Meloxicam, an NSAID, is of higher risk; according to existing data, the safety of these three drugs during pregnancy is unknown. We also present a web-based application that enables physicians to examine a specific drug and its risk factors. AVAILABILITY AND IMPLEMENTATION The code is available from https://github.com/goolig/drug_safety_pregnancy_prediction.git. SUPPLEMENTARY INFORMATION The labeled lists of drugs are available from https://icc.ise.bgu.ac.il/medical_ai/drug_preg/full/.