Detection of Birds in the Wild using Deep Learning Methods

Object detection and localization is one of the prominent applications of the computer vision. The paper presents a comparative study of state of the art deep learning methods-YOLOv2, YOLOv3 and Mask R-CNN, for detection of birds in the wild. Detection of birds is an important problem across multiple applications including the aviation safety, avian protection and ecological science of migrant bird species. Deep learning based methods are very pre-eminent at detecting and localizing the birds in the image as it can tackle the conditions wherein the birds shown are diverse in shapes and sizes and most importantly the complex backgrounds they are in. We used the training and testing dataset provided by the NCVPRIG (BROID) conference which contained 325 and 275 images respectively. For training, we used the pre-trained models on the VOC 2012 and COCO dataset and trained them on the 325 images. We used F-score as one of the performance metrics, and F-Scores were 0.8140, 0.8721, 0.8688 for the YOLOv2, YOLOv3 and Mask R-CNN respectively. The results show that YOLOv3 outperforms YOLOv2 and is a marginal improvement over Mask R-CNN.

[1]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[5]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[9]  B. Poornima,et al.  SEGMENTATION AND OBJECT RECOGNITION USING EDGE DETECTION TECHNIQUES , 2010 .

[10]  Anil Goswami,et al.  Object Recognition Using Texture Based Analysis , 2013 .

[11]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[17]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).