Overview of GeoLifeCLEF 2021: Predicting species distribution from 2 million remote sensing images

Understanding the geographic distribution of species is a key concern in conservation. By pairing species occurrences with environmental features, researchers can model the relationship between an environment and the species which may be found there. To advance research in this area, a large-scale machine learning competition called GeoLifeCLEF 2021 was organized. It relied on a dataset of 1.9 million observations from 31K species mainly of animals and plants. These observations were paired with high-resolution remote sensing imagery, land cover data, and altitude, in addition to traditional lowresolution climate and soil variables. The main goal of the challenge was to better understand how to leverage remote sensing data to predict the presence of species at a given location. This paper presents an overview of the competition, synthesizes the approaches used by the participating groups, and analyzes the main results. In particular, we highlight the ability of remote sensing imagery and convolutional neural networks to improve predictive performance, complementary to traditional approaches.

[1]  Suming Jin,et al.  Completion of the 2011 National Land Cover Database for the Conterminous United States – Representing a Decade of Land Cover Change Information , 2015 .

[2]  Christophe Botella,et al.  Overview of LifeCLEF Location-based Species Prediction Task 2020 (GeoLifeCLEF) , 2020, Conference and Labs of the Evaluation Forum.

[3]  Evgenii Chzhen,et al.  Set-valued classification - overview via a unified framework , 2021, ArXiv.

[4]  Carsten F. Dormann,et al.  Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure , 2017 .

[5]  Sachith Seneviratne,et al.  Contrastive Representation Learning for Natural World Imagery: Habitat prediction for 30, 000 species , 2021, CLEF.

[6]  J. Elith,et al.  Species Distribution Models: Ecological Explanation and Prediction Across Space and Time , 2009 .

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  Hervé Glotin,et al.  Overview of LifeCLEF 2021: An Evaluation of Machine-Learning Based Species Identification and Species Distribution Prediction , 2021, CLEF.

[9]  P. Alam ‘W’ , 2021, Composites Engineering.

[10]  Titouan Lorieul Uncertainty in predictions of deep learning models for fine-grained classification. (Incertitude des prédictions dans les modèles d'apprentissage profonds appliqués à la classification fine) , 2020 .

[11]  Pierre Bonnet,et al.  Overview of GeoLifeCLEF 2019: Plant Species Prediction using Environment and Animal Occurrences , 2019, CLEF.

[12]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[13]  Steven J. Phillips,et al.  Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. , 2009, Ecological applications : a publication of the Ecological Society of America.

[14]  Pierre Bonnet,et al.  Overview of GeoLifeCLEF 2018: Location-based Species Recommendation , 2018, CLEF.

[15]  Christophe Botella,et al.  The GeoLifeCLEF 2020 Dataset , 2020, ArXiv.

[16]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[17]  Maximilien Servajean,et al.  Participation of LIRMM / Inria to the GeoLifeCLEF 2020 Challenge , 2020, CLEF.

[18]  Mark S. Boyce,et al.  Modelling distribution and abundance with presence‐only data , 2006 .

[19]  Quoc V. Le,et al.  Randaugment: Practical automated data augmentation with a reduced search space , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  J. L. Parra,et al.  Very high resolution interpolated climate surfaces for global land areas , 2005 .