On pseudo-absence generation and machine learning for locust breeding ground prediction in Africa

Desert locust outbreaks threaten the food security of a large part of Africa and have affected the livelihoods of millions of people over the years. Furthermore, these outbreaks could potentially become more severe and frequent as a result of global climate change. Machine learning (ML) has been demonstrated as an effective approach to locust distribution modelling which could assist in early warning. However, ML requires a significant amount of labelled data to train. Most publicly available labelled data on locusts are presence-only data, where only the sightings of locusts being present at a particular location are recorded. Therefore, prior work using ML have resorted to pseudo-absence generation methods as a way to circumvent this issue and build balanced datasets for training. The most commonly used approach is to randomly sample points in a region of interest while ensuring that these sampled pseudo-absence points are at least a specific distance away from true presence points. In this paper, we compare this random sampling approach to more advanced pseudo-absence generation methods, such as environmental profiling and optimal background extent limitation, specifically for predicting desert locust breeding grounds in Africa. Interestingly, we find that for the algorithms we tested, namely logistic regression, gradient boosting, random forests and maximum entropy, all popular in prior work, the logistic model performed significantly better (p-value < 2 × 10−16) than the more sophisticated ensemble methods, both in terms of prediction accuracy and F1 score. Although background extent limitation combined with random sampling seemed to boost performance for ensemble methods, for LR this was not the case, and instead a significant improvement was obtained when using environmental profiling. In light of this, we conclude that a simpler ML approach such as logistic regression combined with more advanced pseudo-absence generation, specifically environmental profiling, can be a sensible and effective approach to predicting locust breeding grounds across Africa.

[1]  K. Mwangi,et al.  Climate change and locust outbreak in East Africa , 2020, Nature Climate Change.

[2]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[3]  Zengyun Hu,et al.  Geographic Distribution of Desert Locusts in Africa, Asia and Europe Using Multiple Sources of Remote-Sensing Data , 2020, Remote. Sens..

[4]  A. Yadav,et al.  A PLAN for Tackling the Locust Crisis in East Africa: Harnessing Spatiotemporal Deep Models for Locust Movement Forecasting , 2021, KDD.

[5]  Carlos Casanova,et al.  Machine learning approach to locate desert locust breeding areas based on ESA CCI soil moisture , 2018, Journal of Applied Remote Sensing.

[6]  Olivier Merlin,et al.  Soil moisture from remote sensing to forecast desert locust presence , 2019, Journal of Applied Ecology.

[7]  Giampiero Maracchi,et al.  Large-scale climatic patterns forcing desert locust upsurges in West Africa , 2008 .

[8]  Cyril Piou,et al.  Coupling historical prospection data and a remotely-sensed vegetation index for the preventative control of Desert locusts , 2013 .

[9]  K. G. Mukerji,et al.  Integrated management of arthropod pests and insect borne diseases , 2010 .

[10]  F. Jiguet,et al.  Selecting pseudo‐absences for species distribution models: how, where and how many? , 2012 .

[11]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[12]  J. Bedia,et al.  A framework for species distribution modelling with improved pseudo-absence generation , 2015 .

[13]  J. Casanova,et al.  Prediction of desert locust breeding areas using machine learning methods and SMOS (MIR_SMNRT2) Near Real Time product , 2021 .

[14]  Jonathan L. Case,et al.  Detecting Desert Locust Breeding Grounds: A Satellite-Assisted Modeling Approach , 2021, Remote. Sens..

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Kevin Winner,et al.  Species Distribution Modeling for Machine Learning Practitioners: A Review , 2021, COMPASS.

[17]  Arnav Kumar Jain,et al.  Predicting Regional Locust Swarm Distribution with Recurrent Neural Networks , 2020, ArXiv.

[18]  Elfatih M. Abdel-Rahman,et al.  Prediction of breeding regions for the desert locust Schistocerca gregaria in East Africa , 2020, Scientific Reports.

[19]  Olivier Merlin,et al.  Smos based High Resolution Soil Moisture Estimates for Desert Locust Preventive Management , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[20]  Pablo Salvador,et al.  Desert locust detection using Earth observation satellite data in Mauritania , 2019, Journal of Arid Environments.

[21]  Ramesh Sivanpillai,et al.  Locust Habitat Monitoring and Risk Assessment Using Remote Sensing and GIS Technologies , 2010 .

[22]  Robert P. Anderson,et al.  Opening the black box: an open-source release of Maxent , 2017 .

[23]  Michel Lecoq,et al.  Locust and Grasshopper Management. , 2019, Annual review of entomology.

[24]  Felix C. Freiling,et al.  Early warning system. , 2010, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[25]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[26]  Keith Cressman,et al.  Role of remote sensing in desert locust early warning , 2013 .

[27]  Robert P. Anderson,et al.  Maximum entropy modeling of species geographic distributions , 2006 .

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[29]  Alexandre V. Latchininsky,et al.  Locusts and remote sensing: a review , 2013 .

[30]  N. Oppelt,et al.  Application of Remote Sensing Data for Locust Research and Management—A Review , 2021, Insects.

[31]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[32]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[33]  J. Casanova,et al.  Modelling desert locust presences using 32-year soil moisture data on a large-scale , 2020 .

[34]  Keith Cressman,et al.  The Use of New Technologies in Desert Locust Early Warning , 2008 .