Using machine-learning approaches to predict non-participation in a nationwide general health check-up scheme

BACKGROUND In the time since the launch of a nationwide general health check-up and instruction program in Japan in 2008, interest in the formulation of an effective and efficient strategy to improve the participation rate has been growing. The aim of this study was to develop and evaluate models identifying those who are unlikely to undergo general health check-ups. We used machine-learning methods to select interventional targets more efficiently. METHODS We used information from a local government database of Japan. The study population included 7290 individuals aged 40-74 years who underwent at least one general health check-up between 2012 and 2015. We developed four predictive models based on the extreme gradient boosting (XGBoost), random forest (RF), support vector machines (SVMs), and logistic regression (LR) algorithms, using machine-learning techniques, and compared the areas under the curves (AUCs) of the models with those of the heuristic method (which presumes that the individuals who underwent a general health check-up in the previous year will do so again in the following year). RESULTS The AUCs for the XGBoost, RF, SVMs, LR, and heuristic models/method were 0.829 (95% confidence interval [CI]: 0.806-0.853), 0.821 (95% CI: 0.797-0.845), 0.812 (95% CI: 0.787-0.837), 0.816 (95% CI: 0.791-0.841), and 0.683 (95% CI: 0.657-0.708), respectively. XGBoost model exhibited the best AUC, and the performance was significantly better than that of SVMs (p = 0.034), LR (p = 0.017), and heuristic method (p < 0.001). However, the performance of XGBoost did not differ significantly from that of RF (p = 0.229). CONCLUSION Predictive models using machine-learning techniques outperformed the existing heuristic method when used to predict participation in a general health check-up system by eligible participants.

[1]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[2]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[3]  I. Guyon,et al.  The Higgs Machine Learning Challenge , 2015 .

[4]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[5]  D. Chandra Bose Modern Marketing: Principles and Practice , 2010 .

[6]  M. Conner,et al.  Predicting health behaviour : research and practice with social cognition models , 2005 .

[7]  G. Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement , 2015, Annals of Internal Medicine.

[8]  Judi Scheffer,et al.  Dealing with Missing Data , 2020, The Big R‐Book.

[9]  Mitsuru Mori,et al.  Impact of health education and screening over all-cause mortality in Japan: evidence from a cohort study during 1984-2002. , 2004, Preventive medicine.

[10]  Huimin Lu,et al.  Multiple Sclerosis Detection Based on Biorthogonal Wavelet Transform, RBF Kernel Principal Component Analysis, and Logistic Regression , 2016, IEEE Access.

[11]  Hiroshi Oyama,et al.  Prediction models to identify individuals at risk of metabolic syndrome who are unlikely to participate in a health intervention program , 2018, Int. J. Medical Informatics.

[12]  K. Iglar,et al.  Improving preventive service delivery at adult complete health check-ups: the Preventive health Evidence-based Recommendation Form (PERFORM) cluster randomized controlled trial , 2006, BMC family practice.

[13]  Alexander D. Stajkovic,et al.  Self-efficacy and work-related performance: A meta-analysis. , 1998 .

[14]  Michael J. Pencina,et al.  The Role of Physicians in the Era of Predictive Analytics. , 2015, JAMA.

[15]  Richard Kay,et al.  Mapping of the DLQI scores to EQ-5D utility values using ordinal logistic regression , 2017, Quality of Life Research.

[16]  Luxia Wang,et al.  RESEARCH ARTICLE Open Access Prevalence of primary biliary cirrhosis in adults referring hospital for annual health check-up in , 2022 .

[17]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[18]  Christer Thrane,et al.  Norwegian students’ package trip propensity in 2007 and 2014 – A logistic regression analysis , 2016 .

[19]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[20]  Atsushi Hozawa,et al.  Participation in health check-ups and mortality using propensity score matched cohort analyses. , 2010, Preventive medicine.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  K. Zou,et al.  Receiver-Operating Characteristic Analysis for Evaluating Diagnostic Tests and Predictive Models , 2007, Circulation.

[23]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[24]  Gary King,et al.  Amelia II: A Program for Missing Data , 2011 .

[25]  A. Moyer,et al.  Tailored interventions to promote mammography screening: a meta-analytic review. , 2007, Preventive medicine.

[26]  D. Rubin Multiple imputation for nonresponse in surveys , 1989 .

[27]  A. Bandura Self-efficacy mechanism in human agency. , 1982 .

[28]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[29]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .