Improving the accessibility and transferability of machine learning algorithms for identification of animals in camera trap images: MLWIC2

Abstract Motion‐activated wildlife cameras (or “camera traps”) are frequently used to remotely and noninvasively observe animals. The vast number of images collected from camera trap projects has prompted some biologists to employ machine learning algorithms to automatically recognize species in these images, or at least filter‐out images that do not contain animals. These approaches are often limited by model transferability, as a model trained to recognize species from one location might not work as well for the same species in different locations. Furthermore, these methods often require advanced computational skills, making them inaccessible to many biologists. We used 3 million camera trap images from 18 studies in 10 states across the United States of America to train two deep neural networks, one that recognizes 58 species, the “species model,” and one that determines if an image is empty or if it contains an animal, the “empty‐animal model.” Our species model and empty‐animal model had accuracies of 96.8% and 97.3%, respectively. Furthermore, the models performed well on some out‐of‐sample datasets, as the species model had 91% accuracy on species from Canada (accuracy range 36%–91% across all out‐of‐sample datasets) and the empty‐animal model achieved an accuracy of 91%–94% on out‐of‐sample datasets from different continents. Our software addresses some of the limitations of using machine learning to classify images from camera traps. By including many species from several locations, our species model is potentially applicable to many camera trap studies in North America. We also found that our empty‐animal model can facilitate removal of images without animals globally. We provide the trained models in an R package (MLWIC2: Machine Learning for Wildlife Image Classification in R), which contains Shiny Applications that allow scientists with minimal programming experience to use trained models and train new models in six neural network architectures with varying depths.

[1]  J Andrew Royle,et al.  Generalized site occupancy models allowing for false positive and false negative errors. , 2006, Ecology.

[2]  Mohammad Sadegh Norouzzadeh,et al.  A deep active learning system for species identification and counting in camera trap images , 2019, Methods in Ecology and Evolution.

[3]  C. Lintott,et al.  Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna , 2015, Scientific Data.

[4]  Dan Morris,et al.  Efficient Pipeline for Camera Trap Image Review , 2019, ArXiv.

[5]  Marco Willi,et al.  Identifying animal species in camera trap images using deep learning and citizen science , 2018, Methods in Ecology and Evolution.

[6]  A. F. O'connell,et al.  Camera Traps in Animal Ecology , 2011 .

[7]  Jonathan Huang,et al.  Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Margaret Kosmala,et al.  Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning , 2017, Proceedings of the National Academy of Sciences.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhongqi Miao,et al.  Insights and approaches using deep learning to classify wildlife , 2019, Scientific Reports.

[11]  Michael A. Tabak,et al.  Machine learning to classify animal species in camera trap images: applications in ecology , 2018, bioRxiv.

[12]  M. Tobler,et al.  Spatiotemporal hierarchical modelling of species richness and occupancy using camera trap data , 2015 .

[13]  Victor Anton,et al.  Monitoring the mammalian fauna of urban areas using remote cameras and citizen science , 2020 .

[14]  Haitao Zhao,et al.  Identification of the molecular relationship between intravenous leiomyomatosis and uterine myoma using RNA sequencing , 2019, Scientific Reports.

[15]  D. Slip,et al.  Quantifying imperfect camera-trap detection probabilities: implications for density modelling , 2020, Wildlife Research.

[16]  A. F. O'connell,et al.  Camera traps in animal ecology : methods and analyses , 2011 .

[17]  Zhi Zhang,et al.  Animal Detection From Highly Cluttered Natural Scenes Using Spatiotemporal Object Region Proposals and Patch Verification , 2016, IEEE Transactions on Multimedia.

[18]  B. Ridoutt,et al.  Mapping phosphorus hotspots in Sydney’s organic wastes: a spatially explicit inventory to facilitate urban phosphorus recycling , 2018 .

[19]  G. Guillera‐Arroita,et al.  Dealing with false‐positive and false‐negative errors about species occurrence at multiple levels , 2017 .

[20]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[21]  Helen E. Roy,et al.  Thinking like a naturalist: enhancing computer vision of citizen science images by harnessing contextual data , 2019, bioRxiv.

[22]  Saul Greenberg,et al.  Three critical factors affecting automated image species recognition performance for camera traps , 2020, Ecology and evolution.

[23]  Jing Li,et al.  Zilong: A tool to identify empty images in camera-trap data , 2020, Ecol. Informatics.