DICENet: Fine-Grained Recognition via Dilated Iterative Contextual Encoding

Material Recognition is an intriguing problem in Computer Vision. While traditional approaches prefer an ensemble of networks to capture essential properties such as texture, more recent approaches leverage the power of Deep Learning to design end-to-end models. We do the same, and propose Dilated Iterative Contextual Encoding Network, a novel end-to-end framework for material recognition. As a result of gathering extensive knowledge on various characteristics of materials, our approach combines different components on the base network to address specific properties, which also helps in general recognition tasks. The traditional ResNet is replaced by a Dilated Residual Network to help capture fine-grained material information. Iterative deep aggregation helps capture and fuse global homogeneous material properties across multiple resolutions and scales. To enhance the discriminatory power of the learnt latent representation, we propose gramedial loss which is intuitively applied on a texture vector space. Spatial similarity loss is applied on strategic intermediate feature maps to effectively capture local non-homogeneous texture features from a global context, crucial for the primary classification task. Extensive experiments conducted on golden material datasets such as the FMD, MINC-2500, KTH-TIPS-2b, DTD and GTOS indicate improved performances over state of the art approaches on large datasets and two small datasets, while achieving compatible accuracies on the challenging FMD. Furthermore, our architecture also performed convincingly while categorizing general indoor and object classification datasets such as MITIndoor and CalTech-101.

[1]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[2]  Iasonas Kokkinos,et al.  Describing Textures in the Wild , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Subhransu Maji,et al.  Deep filter banks for texture recognition and segmentation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Cordelia Schmid,et al.  A sparse texture representation using local affine regions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Qing Li,et al.  Locally-Transferred Fisher Vectors for Texture Classification , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Yang Gao,et al.  Compact Bilinear Pooling , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Subhransu Maji,et al.  Bilinear CNNs for Fine-grained Visual Recognition , 2015 .

[10]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[11]  Hang Zhang,et al.  Deep Texture Manifold for Ground Terrain Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[13]  Andrew Zisserman,et al.  Fisher Vector Faces in the Wild , 2013, BMVC.

[14]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kristin J. Dana,et al.  Compact representation of bidirectional texture functions , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  Kristin J. Dana,et al.  Recognition methods for 3D textured surfaces , 2001, IS&T/SPIE Electronic Imaging.

[17]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[18]  Leon A. Gatys,et al.  Texture Synthesis Using Convolutional Neural Networks , 2015, NIPS.

[19]  Kristin J. Dana,et al.  Deep TEN: Texture Encoding Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[21]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Gaurav Sharma,et al.  Local Higher-Order Statistics (LHS) for Texture Categorization and Facial Analysis , 2012, ECCV.

[23]  Yu Qiao,et al.  A Discriminative Feature Learning Approach for Deep Face Recognition , 2016, ECCV.

[24]  Thomas A. Funkhouser,et al.  Dilated Residual Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Andrew Zisserman,et al.  Classifying Images of Materials: Achieving Viewpoint and Illumination Independence , 2002, ECCV.

[26]  Heng Huang,et al.  Fusing subcategory probabilities for texture classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  E. Adelson,et al.  Accuracy and speed of material categorization in real-world images. , 2014, Journal of vision.

[28]  Barbara Caputo,et al.  Class-Specific Material Categorisation , 2005, ICCV.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Subhransu Maji,et al.  Visualizing and Understanding Deep Texture Representations , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[32]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[33]  Trevor Darrell,et al.  Deep Layer Aggregation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Hang Zhang,et al.  Differential Angular Imaging for Material Recognition , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).