Animal Scanner: Software for classifying humans, animals, and empty frames in camera trap images

Abstract Camera traps are a popular tool to sample animal populations because they are noninvasive, detect a variety of species, and can record many thousands of animal detections per deployment. Cameras are typically set to take bursts of multiple photographs for each detection and are deployed in arrays of dozens or hundreds of sites, often resulting in millions of photographs per study. The task of converting photographs to animal detection records from such large image collections is daunting, and made worse by situations that generate copious empty pictures from false triggers (e.g., camera malfunction or moving vegetation) or pictures of humans. We developed computer vision algorithms to detect and classify moving objects to aid the first step of camera trap image filtering—separating the animal detections from the empty frames and pictures of humans. Our new work couples foreground object segmentation through background subtraction with deep learning classification to provide a fast and accurate scheme for human–animal detection. We provide these programs as both Matlab GUI and command prompt developed with C++. The software reads folders of camera trap images and outputs images annotated with bounding boxes around moving objects and a text file summary of results. This software maintains high accuracy while reducing the execution time by 14 times. It takes about 6 seconds to process a sequence of ten frames (on a 2.6 GHZ CPU computer). For those cameras with excessive empty frames due to camera malfunction or blowing vegetation automatically removes 54% of the false‐triggers sequences without influencing the human/animal sequences. We achieve 99.58% on image‐level empty versus object classification of Serengeti dataset. We offer the first computer vision tool for processing camera trap images providing substantial time savings for processing large image datasets, thus improving our ability to monitor wildlife across large scales with camera traps.

[1]  Yu-Wing Tai,et al.  Accurate Single Stage Detector Using Recurrent Rolling Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Narendra Ahuja,et al.  Robust Orthonormal Subspace Learning: Efficient Recovery of Corrupted Low-Rank Matrices , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Joshua J. Millspaugh,et al.  Does hunting or hiking affect wildlife communities in protected areas , 2017 .

[4]  Ching-Tang Hsieh,et al.  An Indoor Obstacle Detection System Using Depth Information and Region Growth , 2015, Sensors.

[5]  F. Parmiggiani,et al.  An investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters , 1995, IEEE Transactions on Geoscience and Remote Sensing.

[6]  Arnold W. M. Smeulders,et al.  Real-Time Visual Concept Classification , 2010, IEEE Transactions on Multimedia.

[7]  Zhihai He,et al.  Volunteer-run cameras as distributed sensors for macrosystem mammal research , 2015, Landscape Ecology.

[8]  Adam Duarte Candid Creatures: How Camera Traps Reveal the Mysteries of Nature. Roland Kays. 2016. The Johns Hopkins University Press, Baltimore, USA. 280 pp. $39.95 hardcover. ISBN: 978-1-421-41888-9. , 2017 .

[9]  Yi Ma,et al.  Robust principal component analysis? , 2009, JACM.

[10]  Matti Pietikäinen,et al.  A Generalized Local Binary Pattern Operator for Multiresolution Gray Scale and Rotation Invariant Texture Classification , 2001, ICAPR.

[11]  Bart Vandereycken,et al.  Low-Rank Matrix Completion by Riemannian Optimization , 2013, SIAM J. Optim..

[12]  C. Lintott,et al.  Snapshot Serengeti, high-frequency annotated camera trap images of 40 mammalian species in an African savanna , 2015, Scientific Data.

[13]  Paul M. Lukacs,et al.  Camera-based occupancy monitoring at large scales: Power to detect trends in grizzly bears across the Canadian Rockies , 2016 .

[14]  Zhihai He,et al.  Object segmentation in the deep neural network feature domain from highly cluttered natural scenes , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[15]  Zhihai He,et al.  Fast human-animal detection from highly cluttered camera-trap images using joint background modeling and deep learning classification , 2017, 2017 IEEE International Symposium on Circuits and Systems (ISCAS).

[16]  Agnieszka C. Miguel,et al.  Finding areas of motion in camera trap images , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[17]  Zhi Zhang,et al.  Visual Informatics Tools for Supporting Large-Scale Collaborative Wildlife Monitoring with Citizen Scientists , 2016, IEEE Circuits and Systems Magazine.

[18]  Alfonso Alonso,et al.  Arboreal camera trapping: taking a proven method to new heights , 2014 .

[19]  Matthew J. Anderson,et al.  Estimating mammalian species richness and occupancy in tropical forest canopies with arboreal camera traps , 2017 .

[20]  Marc Van Droogenbroeck,et al.  ViBe: A Universal Background Subtraction Algorithm for Video Sequences , 2011, IEEE Transactions on Image Processing.

[21]  George Trigeorgis,et al.  A Deep Semi-NMF Model for Learning Hidden Representations , 2014, ICML.

[22]  Laura Balzano,et al.  Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[24]  Andrea Baraldi,et al.  An investigation of the textural characteristics associated with gray level cooccurrence matrix statistical parameters , 1995, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Margaret Kosmala,et al.  Automatically identifying, counting, and describing wild animals in camera-trap images with deep learning , 2017, Proceedings of the National Academy of Sciences.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dong Liang,et al.  Foreground Detection With Simultaneous Dictionary Learning and Historical Pixel Maintenance , 2016, IEEE Transactions on Image Processing.

[28]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[29]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).