Comparing support vector regression and random forests for predicting malaria incidence in Mozambique

Accurate prediction of malaria incidence is essential for the management of several activities in the ministry of health in Mozambique. This study investigates the comparison of support vector machines (SVMs) and random forests (RFs) for this purpose. A dataset with records of malaria cases covering the period 1999-2008 was used to evaluate predictive models on the last year when developed from one up to nine years of historical data. Mean squared error (MSE) was used as the performance metric. The scheme for estimating variable importance commonly employed for RFs was also adopted for SVMs. SVMs developed from two years of historical data obtained the best prediction accuracy. Hence, if we are interested in predicting the actual number of malaria cases the support vector machines model should be chosen. In the analysis of variable importance, Indoor Residual Spray (IRS), the districts of Manhiça and Matola and month of January turned out to be the most important predictors in both the SVM and RF models.

[1]  David R. Musicant,et al.  Understanding Support Vector Machine Classifications via a Recommender System-Like Approach , 2009, DMIN.

[2]  D. Basak,et al.  Support Vector Regression , 2008 .

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Orlando P. Zacarias,et al.  Spatial and temporal patterns of malaria incidence in Mozambique , 2011, Malaria Journal.

[5]  F. Binka,et al.  Acceptability and use of insecticide impregnated bednets in northern Ghana , 1997, Tropical medicine & international health : TM & IH.

[6]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Li Liao,et al.  Combining Pairwise Sequence Similarity and Support Vector Machines for Detecting Remote Protein Evolutionary and Structural Relationships , 2003, J. Comput. Biol..

[9]  Santosh K. Mishra,et al.  De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures , 2007, Bioinform..

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  Orlando P. Zacarias,et al.  Predicting the Incidence of Malaria Cases in Mozambique Using Regression Trees and Forests , 2013 .

[12]  Yu Zong Chen,et al.  Prediction of RNA-binding proteins from primary sequence by a support vector machine approach. , 2004, RNA.

[13]  Achim Zeileis,et al.  BMC Bioinformatics BioMed Central Methodology article Conditional variable importance for random forests , 2008 .