A deep active learning system for species identification and counting in camera trap images

A typical camera trap survey may produce millions of images that require slow, expensive manual review. Consequently, critical conservation questions may be answered too slowly to support decision‐making. Recent studies demonstrated the potential for computer vision to dramatically increase efficiency in image‐based biodiversity surveys; however, the literature has focused on projects with a large set of labelled training images, and hence many projects with a smaller set of labelled images cannot benefit from existing machine learning techniques. Furthermore, even sizable projects have struggled to adopt computer vision methods because classification models overfit to specific image backgrounds (i.e. camera locations). In this paper, we combine the power of machine intelligence and human intelligence via a novel active learning system to minimize the manual work required to train a computer vision model. Furthermore, we utilize object detection models and transfer learning to prevent overfitting to camera locations. To our knowledge, this is the first work to apply an active learning approach to camera trap images. Our proposed scheme can match state‐of‐the‐art accuracy on a 3.2 million image dataset with as few as 14,100 manual labels, which means decreasing manual labelling effort by over 99.5%. Our trained models are also less dependent on background pixels, since they operate only on cropped regions around animals. The proposed active deep learning scheme can significantly reduce the manual labour required to extract information from camera trap images. Automation of information extraction will not only benefit existing camera trap projects, but can also catalyse the deployment of larger camera trap arrays.

[1]  D.SC. PH.D. F.R.S. T. R. E. Southwood Kt Ecological Methods , 1978, Springer Netherlands.

[2]  Hecht-Nielsen Theory of the backpropagation neural network , 1989 .

[3]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[4]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[5]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Xiaowei Xu,et al.  Representative Sampling for Text Classification Using Support Vector Machines , 2003, ECIR.

[7]  H. Robbins A Stochastic Approximation Method , 1951 .

[8]  Russell Greiner,et al.  Optimistic Active-Learning Using Mutual Information , 2007, IJCAI.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[11]  Sanjoy Dasgupta,et al.  Hierarchical sampling for active learning , 2008, ICML '08.

[12]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[13]  Steven J. Phillips,et al.  The art of modelling range‐shifting species , 2010 .

[14]  A. F. O'connell,et al.  Camera traps in animal ecology : methods and analyses , 2011 .

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition , 2012 .

[17]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[18]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[19]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[20]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[21]  C. Lintott,et al.  Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna , 2015, Scientific Data.

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Erin M. Bayne,et al.  REVIEW: Wildlife camera trapping: a review and recommendations for linking surveys to ecological processes , 2015 .

[27]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[28]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[29]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  D. Dunson,et al.  Using joint species distribution models for evaluating how species‐to‐species associations depend on the environmental context , 2017 .

[32]  Lucas Beyer,et al.  In Defense of the Triplet Loss for Person Re-Identification , 2017, ArXiv.

[33]  Pietro Perona,et al.  Recognition in Terra Incognita , 2018, ECCV.

[34]  Michael A. Tabak,et al.  Machine learning to classify animal species in camera trap images: applications in ecology , 2018, bioRxiv.

[35]  Margaret Kosmala,et al.  Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning , 2017, Proceedings of the National Academy of Sciences.

[36]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[37]  Graham W. Taylor,et al.  Deep Learning Object Detection Methods for Ecological Camera Trap Data , 2018, 2018 15th Conference on Computer and Robot Vision (CRV).

[38]  Zhongqi Miao,et al.  Insights and approaches using deep learning to classify wildlife , 2019, Scientific Reports.