Combining deep features for object detection at various scales: finding small birds in landscape images

Demand for automatic bird ecology investigation rises rapidly along with the widespread installation of wind energy plants to estimate their adverse environmental effect. While significant advance in general image recognition has been made by deep convolutional neural networks (CNNs), automatically recognizing birds at small scale together with large background regions is still an open problem in computer vision. To tackle object detection at various scales, we combine a deep detector with semantic segmentation methods; namely, we train a deep CNN detector, fully convolutional networks (FCNs), and the variant of FCNs, and integrate their results by the support vector machines to achieve high detection performance. Through experimental results on a bird image dataset, we show the effectiveness of the method for scale-aware object detection.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  Takeshi Naemura,et al.  Construction of a bird image dataset for ecological investigations , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[4]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Takeshi Naemura,et al.  Detection of small birds in large images by combining a deep detector with semantic segmentation , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Svetlana Lazebnik,et al.  Superparsing - Scalable Nonparametric Image Parsing with Superpixels , 2010, International Journal of Computer Vision.

[11]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael L. Morrison,et al.  Influence of Behavior on Bird Mortality in Wind Energy Developments , 2009 .

[13]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Jian Dong,et al.  Towards Unified Object Detection and Semantic Segmentation , 2014, ECCV.

[16]  Svetlana Lazebnik,et al.  Finding Things: Image Parsing with Regions and Per-Exemplar Detectors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  J. Uijlings,et al.  UvA-DARE (Digital Academic Repository) Selective Search for Object Recognition Selective Search for Object Recognition , 2013 .

[18]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[19]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[20]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[21]  S. Mallat VI – Wavelet zoom , 1999 .

[22]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[23]  S. Mallat A wavelet tour of signal processing , 1998 .

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Bernt Schiele,et al.  What Is Holding Back Convnets for Detection? , 2015, GCPR.