Highly accurate and explainable detection of specimen mix-up using a machine learning model

Abstract Background Delta check is widely used for detecting specimen mix-ups. Owing to the inadequate specificity and sparseness of the absolute incidence of mix-ups, the positive predictive value (PPV) of delta check is considerably low as it is labor consuming to identify true mix-up errors among a large number of false alerts. To overcome this problem, we developed a new accurate detection model through machine learning. Methods Inspired by delta check, we decided to conduct comparisons with the past examinations and broaden the time range. Fifteen common items were selected from complete blood cell counts and biochemical tests. We considered examinations in which ≥11 among the 15 items were measured simultaneously in our hospital; we created individual partial time-series data of the consecutive examinations with a sliding window size of 4. The last examinations of the partial time-series data were shuffled to generate artificial mix-up cases. After splitting the dataset into development and validation sets, we allowed a gradient-boosting-decision-tree (GBDT) model to learn using the development set to detect whether the last examination results of the partial time-series data were artificial mixed-up results. The model’s performance was evaluated on the validation set. Results The area under the receiver operating characteristic curve (ROC AUC) of our model was 0.9983 (bootstrap confidence interval [bsCI]: 0.9983–0.9985). Conclusions The GBDT model was more effective in detecting specimen mix-up. The improved accuracy will enable more facilities to perform more efficient and centralized mix-up detection, leading to improved patient safety.