Deep convolutional neural network based species recognition for wild animal monitoring

We proposed a novel deep convolutional neural network based species recognition algorithm for wild animal classification on very challenging camera-trap imagery data. The imagery data were captured with motion triggered camera trap and were segmented automatically using the state of the art graph-cut algorithm. The moving foreground is selected as the region of interests and is fed to the proposed species recognition algorithm. For the comparison purpose, we use the traditional bag of visual words model as the baseline species recognition algorithm. It is clear that the proposed deep convolutional neural network based species recognition achieves superior performance. To our best knowledge, this is the first attempt to the fully automatic computer vision based species recognition on the real camera-trap images. We also collected and annotated a standard camera-trap dataset of 20 species common in North America, which contains 14, 346 training images and 9, 530 testing images, and is available to public for evaluation and benchmark purpose.

[1]  L. Mech,et al.  Handbook of animal radio-tracking , 1983 .

[2]  Geoffrey E. Hinton Deterministic Boltzmann Learning Performs Steepest Descent in Weight-Space , 1989, Neural Computation.

[3]  W. Link,et al.  The North American Breeding Bird Survey Results and Analysis , 1997 .

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  M. Williams,et al.  Satellite tracking of threatened species , 1998 .

[6]  I. Hulbert,et al.  The accuracy of GPS for wildlife telemetry and habitat mapping , 2001 .

[7]  Ian F. Akyildiz,et al.  Wireless sensor networks: a survey , 2002, Comput. Networks.

[8]  Yong Wang,et al.  Energy-efficient computing for wildlife tracking: design tradeoffs and early experiences with ZebraNet , 2002, ASPLOS X.

[9]  Christopher K. Wikle,et al.  Hierarchical Bayesian Models for Predicting The Spread of Ecological Processes , 2003 .

[10]  T. L. Brown,et al.  Deer populations up, hunter populations down: Implications of interdependence of deer and hunter population dynamics on management , 2003 .

[11]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[12]  John Anderson,et al.  An analysis of a large scale habitat monitoring application , 2004, SenSys '04.

[13]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[14]  J. Veech A Comparison of Landscapes Occupied by Increasing and Decreasing Populations of Grassland Birds , 2006, Conservation biology : the journal of the Society for Conservation Biology.

[15]  Zhihai He,et al.  A new 'view' of ecology and conservation through animal-borne video systems. , 2007, Trends in ecology & evolution.

[16]  R. B. Hammer,et al.  Associations of forest bird species richness with housing and landscape patterns across the USA. , 2007, Ecological applications : a publication of the Ecological Society of America.

[17]  S. Freeman,et al.  Modelling population changes using data from different surveys: the Common Birds Census and the Breeding Bird Survey , 2007 .

[18]  H. Regan,et al.  Relationships between Human Disturbance and Wildlife Land Use in Urban Habitat Fragments , 2008, Conservation biology : the journal of the Society for Conservation Biology.

[19]  W. Link,et al.  Combining Breeding Bird Survey and Christmas Bird Count Data to Evaluate Seasonal Components of Population Change in Northern Bobwhite , 2008 .

[20]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[21]  Jason Weston,et al.  Towards Open-Text Semantic Parsing via Multi-Task Learning of Structured Embeddings , 2011, ArXiv.

[22]  Jeffrey Pennington,et al.  Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection , 2011, NIPS.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Jason Weston,et al.  Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing , 2012, AISTATS.

[25]  Tony X. Han,et al.  Ensemble Video Object Cut in Highly Dynamic Scenes , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dong Yu,et al.  The Deep Tensor Neural Network With Applications to Large Vocabulary Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.