A new method to control error rates in automated species identification with deep learning algorithms

Processing data from surveys using photos or videos remains a major bottleneck in ecology. Deep Learning Algorithms (DLAs) have been increasingly used to automatically identify organisms on images. However, despite recent advances, it remains difficult to control the error rate of such methods. Here, we proposed a new framework to control the error rate of DLAs. More precisely, for each species, a confidence threshold was automatically computed using a training dataset independent from the one used to train the DLAs. These species-specific thresholds were then used to post-process the outputs of the DLAs, assigning classification scores to each class for a given image including a new class called “unsure”. We applied this framework to a study case identifying 20 fish species from 13,232 underwater images on coral reefs. The overall rate of species misclassification decreased from 22% with the raw DLAs to 2.98% after post-processing using the thresholds defined to minimize the risk of misclassification. This new framework has the potential to unclog the bottleneck of information extraction from massive digital data while ensuring a high level of accuracy in biodiversity assessment.

[1]  Björn Reineking,et al.  Remote monitoring of vigilance behavior in large herbivores using acceleration data , 2017, Animal Biotelemetry.

[2]  Jenq-Neng Hwang,et al.  A Feature Learning and Object Recognition Framework for Underwater Fish Images , 2016, IEEE Transactions on Image Processing.

[3]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[4]  Hervé Glotin,et al.  LifeCLEF 2017 Lab Overview: Multimedia Species Identification Challenges , 2017, CLEF.

[5]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[6]  Dennis Shasha,et al.  SafePredict: A Meta-Algorithm for Machine Learning That Uses Refusals to Guarantee Correctness , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[8]  Marc Chaumont,et al.  A Deep learning method for accurate and fast identification of coral reef fishes in underwater images , 2018, Ecol. Informatics.

[9]  Mario Vento,et al.  To reject or not to reject: that is the question-an answer in case of neural classifiers , 2000, IEEE Trans. Syst. Man Cybern. Part C.

[10]  S. Wich,et al.  Dawn of Drone Ecology: Low-Cost Autonomous Aerial Vehicles for Conservation , 2012 .

[11]  J. Hintze,et al.  Violin plots : A box plot-density trace synergism , 1998 .

[12]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[13]  Dominique Pelletier,et al.  Underwater video techniques for observing coastal marine biodiversity: A review of sixty years of publications (1952–2012) , 2014 .

[14]  Mehryar Mohri,et al.  Boosting with Abstention , 2016, NIPS.

[15]  Jeremy Nixon,et al.  Measuring Calibration in Deep Learning , 2019, CVPR Workshops.

[16]  Tetsukazu Yahara,et al.  Towards a global terrestrial species monitoring program , 2015 .

[17]  Wilfried Thuiller,et al.  Rare Species Support Vulnerable Functions in High-Diversity Ecosystems , 2013, PLoS biology.

[18]  Anne Bowser,et al.  Building essential biodiversity variables(EBVs) of species distribution and abundanceat a global scale , 2017 .

[19]  C. Chenu,et al.  Bidirectional regulation of bone formation by exogenous and osteosarcoma-derived Sema3A , 2018, Scientific Reports.

[20]  Warren S. Sarle,et al.  Stopped Training and Other Remedies for Overfitting , 1995 .

[21]  P. Favali,et al.  Coastal observatories for monitoring of fish behaviour and their responses to environmental changes , 2015, Reviews in Fish Biology and Fisheries.

[22]  N. Coops,et al.  Satellites: Make Earth observations open access , 2014, Nature.

[23]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  N. Pettorelli,et al.  Essential Biodiversity Variables , 2013, Science.

[26]  Devis Tuia,et al.  Detecting Mammals in UAV Images: Best Practices to address a substantially Imbalanced Dataset with Deep Learning , 2018, Remote Sensing of Environment.

[27]  C. K. Chow,et al.  On optimum recognition error and reject tradeoff , 1970, IEEE Trans. Inf. Theory.

[28]  S. Sandin,et al.  Community-wide scan identifies fish species associated with coral reef services across the Indo-Pacific , 2018, Proceedings of the Royal Society B: Biological Sciences.

[29]  Lian Pin Koh,et al.  Drones count wildlife more accurately and precisely than humans , 2017, bioRxiv.

[30]  P. F. Vasconcelos,et al.  In situ immune response and mechanisms of cell damage in central nervous system of fatal cases microcephaly by Zika virus , 2018, Scientific Reports.

[31]  M. Kessentini,et al.  A Systematic Literature Review , 2016 .

[32]  S. Butchart,et al.  Globally threatened vertebrates on islands with invasive species , 2017, Science Advances.

[33]  Patrick Mäder,et al.  Plant Species Identification Using Computer Vision Techniques: A Systematic Literature Review , 2017, Archives of Computational Methods in Engineering.

[34]  Peter T. Fretwell,et al.  Whales from space: Four mysticete species described using new VHR satellite imagery , 2018, Marine Mammal Science.

[35]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[36]  J. Holmberg,et al.  Long-term assessment of whale shark population demography and connectivity using photo-identification in the Western Atlantic Ocean , 2017, PloS one.

[37]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[38]  K. Gaston What is rarity , 1997 .

[39]  C. Bellard,et al.  Insular threat associations within taxa worldwide , 2018, Scientific Reports.

[40]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[41]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[42]  David Peel,et al.  Unmanned aerial vehicles for surveying marine fauna: assessing detection probability. , 2017, Ecological applications : a publication of the Ecological Society of America.

[43]  E. Azzurro,et al.  Local knowledge and awareness on the incipient lionfish invasion in the eastern Mediterranean Sea , 2017 .

[44]  Matthieu Cord,et al.  Addressing Failure Prediction by Learning Model Confidence , 2019, NeurIPS.

[45]  D. Tilman,et al.  Introduced species that overcome life history tradeoffs can cause native extinctions , 2018, Nature Communications.

[46]  J. Andrew Royle,et al.  Scaling-up camera traps: monitoring the planet's biodiversity with networks of remote sensors , 2017 .

[47]  Xiu Li,et al.  Fast accurate fish detection and recognition of underwater images with Fast R-CNN , 2015, OCEANS 2015 - MTS/IEEE Washington.

[48]  Ran El-Yaniv,et al.  Selective Classification for Deep Neural Networks , 2017, NIPS.

[49]  P. Balvanera,et al.  Pervasive human-driven decline of life on Earth points to the need for transformative change , 2019, Science.

[50]  Simone Marini,et al.  Tracking Fish Abundance by Underwater Image Recognition , 2018, Scientific Reports.

[51]  Anne Bowser,et al.  Building essential biodiversity variables (EBVs) of species distribution and abundance at a global scale , 2018, Biological reviews of the Cambridge Philosophical Society.

[52]  Lawrence N. Hudson,et al.  Widespread winners and narrow-ranged losers: Land use homogenizes biodiversity in local assemblages worldwide , 2018, PLoS biology.

[53]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[54]  Nathalie Pettorelli,et al.  Better together: Integrating and fusing multispectral and radar satellite imagery to inform biodiversity monitoring, ecological research and conservation science , 2017 .