Miss-identification detection in citizen science platform for biodiversity monitoring using machine learning

Abstract In the recent years, several citizen science platforms for biodiversity monitoring have emerged. These platforms represent a powerful tool for collecting biodiversity data for researchers and increasing the knowledge of participants. Typical biodiversity data are species names observed at a given time and place by numerous participants. The use of photos to document observations allows data validation, in particular validation of species identification, a key aspect needed for the quality control of such databases. However, the increasing amount of data collected represents a major challenge given the limited number of co-opted experts dedicated to data validation. Therefore, detecting miss identifications can be very helpful to focus the limited expert workforce on dubious identifications. In this paper, we test various machine learning approaches to detect miss-identifications in such databases based on various features extracted form the history of validated observations. The proposed model can be used to automate the data validation process in the SPIPOLL platform.