Using Predictive Analytics to Identify Children at High Risk of Defaulting From a Routine Immunization Program: Feasibility Study

Background Despite the availability of free routine immunizations in low- and middle-income countries, many children are not completely vaccinated, vaccinated late for age, or drop out from the course of the immunization schedule. Without the technology to model and visualize risk of large datasets, vaccinators and policy makers are unable to identify target groups and individuals at high risk of dropping out; thus default rates remain high, preventing universal immunization coverage. Predictive analytics algorithm leverages artificial intelligence and uses statistical modeling, machine learning, and multidimensional data mining to accurately identify children who are most likely to delay or miss their follow-up immunization visits. Objective This study aimed to conduct feasibility testing and validation of a predictive analytics algorithm to identify the children who are likely to default on subsequent immunization visits for any vaccine included in the routine immunization schedule. Methods The algorithm was developed using 47,554 longitudinal immunization records, which were classified into the training and validation cohorts. Four machine learning models (random forest; recursive partitioning; support vector machines, SVMs; and C-forest) were used to generate the algorithm that predicts the likelihood of each child defaulting from the follow-up immunization visit. The following variables were used in the models as predictors of defaulting: gender of the child, language spoken at the child’s house, place of residence of the child (town or city), enrollment vaccine, timeliness of vaccination, enrolling staff (vaccinator or others), date of birth (accurate or estimated), and age group of the child. The models were encapsulated in the predictive engine, which identified the most appropriate method to use in a given case. Each of the models was assessed in terms of accuracy, precision (positive predictive value), sensitivity, specificity and negative predictive value, and area under the curve (AUC). Results Out of 11,889 cases in the validation dataset, the random forest model correctly predicted 8994 cases, yielding 94.9% sensitivity and 54.9% specificity. The C-forest model, SVMs, and recursive partitioning models improved prediction by achieving 352, 376, and 389 correctly predicted cases, respectively, above the predictions made by the random forest model. All models had a C-statistic of 0.750 or above, whereas the highest statistic (AUC 0.791, 95% CI 0.784-0.798) was observed in the recursive partitioning algorithm. Conclusions This feasibility study demonstrates that predictive analytics can accurately identify children who are at a higher risk for defaulting on follow-up immunization visits. Correct identification of potential defaulters opens a window for evidence-based targeted interventions in resource limited settings to achieve optimal immunization coverage and timeliness.

[1]  E. Santiago,et al.  Physiologically-based, predictive analytics using the heart-rate-to-Systolic-Ratio significantly improves the timeliness and accuracy of sepsis prediction compared to SIRS. , 2017, American journal of surgery.

[2]  Greg Linden,et al.  Amazon . com Recommendations Item-to-Item Collaborative Filtering , 2001 .

[3]  D. Bates,et al.  Big data in health care: using analytics to identify and manage high-risk and high-cost patients. , 2014, Health affairs.

[4]  Marie-Christine Jaulent,et al.  Building a Semantic Interoperability Framework for Care and Research in Fibromuscular Dysplasia , 2015, MedInfo.

[5]  A. Sadoh,et al.  Timeliness and Completion Rate of Immunization among Nigerian Children Attending a Clinic-based Immunization Service , 2009, Journal of health, population, and nutrition.

[6]  Jerome H. Friedman,et al.  A Recursive Partitioning Decision Rule for Nonparametric Classification , 1977, IEEE Transactions on Computers.

[7]  Bo Jin,et al.  Estimating One-Year Risk of Incident Chronic Kidney Disease: Retrospective Development and Validation Study Using Electronic Medical Record Data From the State of Maine , 2017, JMIR medical informatics.

[8]  Guang-Zhong Yang,et al.  Deep Learning for Health Informatics , 2017, IEEE Journal of Biomedical and Health Informatics.

[9]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[10]  B. Mwangi,et al.  Identifying a clinical signature of suicidality among patients with mood disorders: A pilot study using a machine learning approach. , 2016, Journal of affective disorders.

[11]  J. Kai,et al.  Can machine-learning improve cardiovascular risk prediction using routine clinical data? , 2017, PloS one.

[12]  Ariel Linden,et al.  Using data mining techniques to characterize participation in observational studies. , 2016, Journal of evaluation in clinical practice.

[13]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[14]  William Fleischman,et al.  Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[15]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  Andrew D. Asher,et al.  Smarter, Better, Faster: The Potential for Predictive Analytics and Rapid-Cycle Evaluation to Improve Program Development and Outcomes , 2014 .

[18]  A. Onyiriuka Vaccination default rates among children attending a static immunization clinic in Benin City, Nigeria , 2009 .

[19]  Rema Padman,et al.  Innovations in chronic care delivery using data-driven clinical pathways. , 2015, The American journal of managed care.