Deep multi-context Network for FINE-GRAINED VISUAL RECOGNITION

In this paper, we tackle the FINE-GRAINED VISUAL RECOGNITION problem by proposing a deep multi-context framework. We employ deep Convolutional Neural Networks to model features of objects in images. Global context and local context are both taken into consideration, and are jointly modeled in a unified multi-context deep learning framework. To cleanse the relatively dirty data for training, a regional proposal method is designed to make the multi-context modeling suited for fine-grained visual recognition in the real world. Furthermore, recently proposed contemporary deep models are used, and their combination is investigated. Our approaches are evaluated on MSR-IRC 2016 and further assessed on the more complex validation set. The results show significant and consistent improvements over the baseline.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Subhransu Maji,et al.  Bilinear CNN Models for Fine-Grained Visual Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Trevor Darrell,et al.  Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.