Machine Learning Algorithms To Predict The Childhood Anemia In Bangladesh

Anemia, especially among children, is a serious public health problem in Bangladesh. Apart from understanding the factors associated with anemia, it may be of interest to know the likelihood of anemia given the factors. Prediction of disease status is a key to community and health service policy making as well as forecasting for resource planning. We considered machine learning (ML) algorithms to predict the anemia status among children (under five years) using common risk factors as features. Data were extracted from a nationally representative cross-sectional surveyBangladesh Demographic and Health Survey (BDHS) conducted in 2011. In this study, a sample of 2013 children were selected for whom data on all selected variables was available. We used several ML algorithms such as linear discriminant analysis (LDA), classification and regression trees (CART), k-nearest neighbors (k-NN), support vector machines (SVM), random forest (RF) and logistic regression (LR) to predict the childhood anemia status. A systematic evaluation of the algorithms was performed in terms of accuracy, sensitivity, specificity, and area under the curve (AUC). We found that the RF algorithm achieved the best classification accuracy of 68.53% with a sensitivity of 70.73%, specificity of 66.41% and AUC of 0.6857. On the other hand, the classical LR algorithm reached a classification accuracy of 62.75% with a sensitivity of 63.41%, specificity of 62.11% and AUC of 0.6276. Among all considered algorithms, the k-NN gave the least accuracy. We conclude that  Corresponding author Email: jkhan@isrt.ac.bd 196 MACHINE LEARNING ALGORITHMS TO PREDICT THE CHILDHOOD ANEMIA IN BANGLADESH ML methods can be considered in addition to the classical regression techniques when the prediction of anemia is the primary focus.

[1]  E. McLean,et al.  Worldwide prevalence of anaemia, WHO Vitamin and Mineral Nutrition Information System, 1993–2005 , 2009, Public Health Nutrition.

[2]  Deok Won Kim,et al.  Screening for Prediabetes Using Machine Learning Models , 2014, Comput. Math. Methods Medicine.

[3]  Xuehui Meng,et al.  Comparison of three data mining models for predicting diabetes or prediabetes by risk factors , 2013, The Kaohsiung journal of medical sciences.

[4]  Muin J. Khoury,et al.  Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes , 2010, BMC Medical Informatics Decis. Mak..

[5]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[6]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[7]  Hongyu Zhao,et al.  Practical Issues in Building Risk-Predicting Models for Complex Diseases , 2010, Journal of biopharmaceutical statistics.

[8]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[9]  C. Brodley,et al.  Exploration of machine learning techniques in predicting multiple sclerosis disease course , 2017, PloS one.

[10]  Flora,et al.  Reviewing Anemia and Iron Folic Acid Supplementation Program in Bangladesh - A Special Article , 2012 .

[11]  Meghana Nagori,et al.  Classification of Anemia Using Data Mining Techniques , 2011, SEMCCO.

[12]  Manal Alghamdi,et al.  Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project , 2017, PloS one.

[13]  Alan Julian Izenman,et al.  Modern Multivariate Statistical Techniques , 2008 .

[14]  Amalendu Jyotishi,et al.  Investigation of Nutritional Status of Children based on Machine Learning Techniques using Indian Demographic and Health Survey Data , 2017 .

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[18]  Z. Premji,et al.  An analysis of anemia and child mortality. , 2001, The Journal of nutrition.

[19]  I. Ngnie-Teta,et al.  Prevalence and Risk Factors of Anemia among Children 6–59 Months Old in Haiti , 2013, Anemia.

[20]  B. Ames,et al.  An overview of evidence for a causal relation between iron deficiency during development and deficits in cognitive or behavioral function. , 2007, The American journal of clinical nutrition.

[21]  Brijesh P Singh,et al.  Anemia in Married Females of Uttar Pradesh and Its relation to Body Mass Index: Application of Poisson Regression , 2021 .

[22]  Farjana Misu,et al.  Determinants of anemia among 6–59 months aged children in Bangladesh: evidence from nationally representative data , 2016, BMC Pediatrics.

[23]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[24]  M. Abdullah S. Al-Asmari Anemia types prediction based on data mining classification algorithms , 2016 .

[25]  Xiaolong Wang,et al.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach , 2016, Journal of biomolecular structure & dynamics.

[26]  Chung-Ho Hsieh,et al.  Novel solutions for an old disease: diagnosis of acute appendicitis with random forest, support vector machines, and artificial neural networks. , 2011, Surgery.

[27]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[28]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .