Prediction for global African swine fever outbreaks based on a combination of random forest algorithms and meteorological data.

African swine fever (ASF) is a virulent infectious disease of pigs. As there is no effective vaccine and treatment method at present, it poses a great threat to the pig industry once it breaks out. In this paper, we used ASF outbreak data and the WorldClim database meteorological data, and selected the CfsSubset Evaluator-Best First feature selection method combined with the random forest algorithms to construct an African swine fever outbreak prediction model. Subsequently, we also established a test set for data other than modeling, and the accuracy ACC value range of the model on the independent test set was 76.02%-84.64%, which indicated that the modeling effect was better and the prediction accuracy was higher than previous estimates. In addition, logistic regression analysis was conducted on 12 features used for modeling and the ROC curves were drawn. The results showed that the bio14 features (precipitation of driest month), had the largest contribution to the outbreak of ASF, and it was speculated that the outbreak of the epidemic was significantly related to precipitation. Finally, we used this qualitative prediction model to build a global online prediction system for ASF outbreaks, in the hope that this study will help to decision-makers who can then take the relevant prevention and control measures in order to prevent the further spread of future epidemics of the disease.

[1]  S. Hamer,et al.  Reviewing the Potential Vectors and Hosts of African Swine Fever Virus Transmission in the United States , 2019, Vector borne and zoonotic diseases.

[2]  Nathaniel D Osgood,et al.  Social Media Surveillance for Outbreak Projection via Transmission Models: Longitudinal Observational Study , 2019, JMIR public health and surveillance.

[3]  Qin Chen,et al.  Risk analysis of African swine fever in Poland based on spatio-temporal pattern and Latin hypercube sampling, 2014–2017 , 2019, BMC Veterinary Research.

[4]  B. Jiang,et al.  Effects and interaction of meteorological factors on influenza: Based on the surveillance data in Shaoyang, China. , 2019, Environmental research.

[5]  Himan Shahabi,et al.  Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. , 2019, The Science of the total environment.

[6]  C. Corzo,et al.  Identifying outbreaks of Porcine Epidemic Diarrhea virus through animal movements and spatial neighborhoods , 2019, Scientific Reports.

[7]  G. Glass,et al.  Machine learning approaches in GIS-based ecological modeling of the sand fly Phlebotomus papatasi, a vector of zoonotic cutaneous leishmaniasis in Golestan province, Iran. , 2018, Acta tropica.

[8]  Dong Jiang,et al.  Mapping the transmission risk of Zika virus using machine learning models. , 2018, Acta tropica.

[9]  C. Yeh,et al.  Machine learning to relate PM2.5 and PM10 concentrations to outpatient visits for upper respiratory tract infections in Taiwan: A nationwide analysis , 2018, World journal of clinical cases.

[10]  Qiong Meng,et al.  The influence of meteorological factors on tuberculosis incidence in Southwest China from 2006 to 2015 , 2018, Scientific Reports.

[11]  E. Anis,et al.  Prediction of Shigellosis outcomes in Israel using machine learning classifiers , 2018, Epidemiology and Infection.

[12]  Kozo Watanabe,et al.  Machine learning methods reveal the temporal pattern of dengue incidence using meteorological factors in metropolitan Manila, Philippines , 2018, BMC Infectious Diseases.

[13]  Ro-Ting Lin,et al.  Surveillance on the endemic of Zika virus infection by meteorological factors in Colombia: a population-based spatial and temporal study , 2018, BMC Infectious Diseases.

[14]  L. Dixon,et al.  African swine fever: A re-emerging viral disease threatening the global pig industry , 2018, Veterinary journal.

[15]  D. Jiang,et al.  Mapping the spatial distribution of Aedes aegypti and Aedes albopictus. , 2018, Acta tropica.

[16]  Yuewu Liu,et al.  A Review of Epidemic Models Related to Meteorological Factors , 2017, Current Bioinformatics.

[17]  Sabela Ramos,et al.  Multithreaded and Spark parallelization of feature selection filters , 2016, J. Comput. Sci..

[18]  Paraskevas Tsangaratos,et al.  Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size , 2016 .

[19]  Erwan Scornet,et al.  A random forest guided tour , 2015, TEST.

[20]  R. Fite,et al.  Quantitative risk assessment of entry of contagious bovine pleuropneumonia through live cattle imported from northwestern Ethiopia. , 2015, Preventive veterinary medicine.

[21]  Mariana Recamonde Mendoza,et al.  What variables are important in predicting bovine viral diarrhea virus? A random forest approach , 2015, Veterinary Research.

[22]  Shilu Tong,et al.  Using internet search queries for infectious disease surveillance: screening diseases for suitability , 2014, BMC Infectious Diseases.

[23]  Tin Wee Tan,et al.  Predicting host tropism of influenza A virus proteins using random forest , 2014, BMC Medical Genomics.

[24]  D. Pfeiffer,et al.  The risk of rinderpest re-introduction in post-eradication era. , 2014, Preventive veterinary medicine.

[25]  Gail M Williams,et al.  Internet-based surveillance systems for monitoring emerging infectious diseases , 2013, The Lancet Infectious Diseases.

[26]  John S. Brownstein,et al.  The global distribution and burden of dengue , 2013, Nature.

[27]  F. Korennoy,et al.  Cartographical analysis of African swine fever outbreaks in the territory of the Russian Federation and computer modeling of the basic reproduction ratio. , 2011, Preventive veterinary medicine.

[28]  M. Woolhouse How to make predictions about future infectious disease risks , 2011, Philosophical Transactions of the Royal Society B: Biological Sciences.

[29]  Hosik Choi,et al.  Gene selection and prediction for cancer classification using support vector machines with a reject option , 2011, Comput. Stat. Data Anal..

[30]  W. Buermann,et al.  Spatially explicit predictions of blood parasites in a widely distributed African rainforest bird , 2011, Proceedings of the Royal Society B: Biological Sciences.

[31]  M. Schwartz,et al.  Using species distribution models to predict new occurrences for rare plants , 2009 .

[32]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[33]  Gertraud Regula,et al.  Concepts for risk-based surveillance in the field of veterinary medicine and veterinary public health: Review of current approaches , 2006, BMC Health Services Research.

[34]  M. Salman,et al.  The role of veterinary epidemiology and veterinary services in complying with the World Trade Organization SPS agreement. , 2005, Preventive veterinary medicine.

[35]  A. Giovannini,et al.  The Use of Risk Assessment to Decide the Control Strategy for Bluetongue in Italian Ruminant Populations , 2004, Risk analysis : an official publication of the Society for Risk Analysis.

[36]  Sheng-De Wang,et al.  Fuzzy support vector machines , 2002, IEEE Trans. Neural Networks.

[37]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[38]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[39]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[40]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[41]  Anil K. Jain,et al.  Artificial Neural Networks: A Tutorial , 1996, Computer.

[42]  Mohammad Kazem Ebrahimpour,et al.  Ensemble of feature selection methods: A hesitant fuzzy sets approach , 2017, Appl. Soft Comput..

[43]  Thu Zar Phyu,et al.  Performance Comparison of Feature Selection Methods , 2016 .