Multi-Label Visual Feature Learning with Attentional Aggregation

Today convolutional neural networks (CNNs) have reached out to specialized applications in science communities that otherwise would not be adequately tackled. In this paper, we systematically study a multi-label annotation problem of x-ray scattering images in material science. For this application, we tackle an open challenge with training CNNs — identifying weak scattered patterns with diffuse background interference, which is common in scientific imaging. We articulate an Attentional Aggregation Module (AAM) to enhance feature representations. First, we reweight and highlight important features in the images using data-driven attention maps. We decompose the attention maps into channel and spatial attention components. In the spatial attention component, we design a mechanism to generate multiple spatial attention maps tailored for diversified multi-label learning. Then, we condense the enhanced local features into non-local representations by performing feature aggregation. Both attention and aggregation are designed as network layers with learnable parameters so that CNN training remains fluidly end-to-end, and we apply it in-network a few times so that the feature enhancement is multi-scale. We conduct extensive experiments on CNN training and testing, as well as transfer learning, and empirical studies confirm that our method enhances the discriminative power of visual features of scientific imaging.

[1]  Dantong Yu,et al.  Dataset of Synthetic X-ray Scattering Images for Classification Using Deep Learning , 2017 .

[2]  Yichuan Tang,et al.  Learning Deep Convolutional Neural Networks for X-Ray Protein Crystallization Image Analysis , 2016, AAAI.

[3]  Sina Honari,et al.  Recombinator Networks: Learning Coarse-to-Fine Feature Aggregation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Boyu Wang,et al.  X-Ray Scattering Image Classification Using Deep Learning , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[5]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[6]  Andrew J. Senesi,et al.  Small-angle scattering of particle assemblies , 2015 .

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Ming Zhong,et al.  Bag-of-feature-graphs: A new paradigm for non-rigid shape retrieval , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[14]  Tomás Pajdla,et al.  NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[16]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[17]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Ivan Laptev,et al.  Learnable pooling with Context Gating for video classification , 2017, ArXiv.

[19]  Thomas Serre,et al.  Global-and-local attention networks for visual recognition , 2018, ArXiv.

[20]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[21]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[22]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jan Skov Pedersen,et al.  Analysis of small-angle scattering data from colloids and polymer solutions: modeling and least-squares fitting , 1997 .

[24]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Nitish Srivastava,et al.  Learning Generative Models with Visual Attention , 2013, NIPS.

[26]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  O. Ronneberger,et al.  Fourier Analysis in Polar and Spherical Coordinates , 2008 .

[28]  K. Yager,et al.  Periodic lattices of arbitrary nano‐objects: modeling and applications for self‐assembled systems , 2014 .

[29]  Hong Qin,et al.  Automatic X-ray Scattering Image Annotation via Double-View Fourier-Bessel Convolutional Networks , 2018, BMVC.

[30]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Geoffrey E. Hinton,et al.  Learning to combine foveal glimpses with a third-order Boltzmann machine , 2010, NIPS.

[32]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[33]  Hao Huang,et al.  Diffusion-based clustering analysis of coherent X-ray scattering patterns of self-assembled nanoparticles , 2014, SAC.

[34]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Alexander C. Berg,et al.  Materials discovery: Fine-grained classification of X-ray scattering images , 2014, IEEE Winter Conference on Applications of Computer Vision.

[36]  W. Park,et al.  Classification of crystal structure using a convolutional neural network , 2017, IUCrJ.

[37]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[38]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[39]  Andrew Zisserman,et al.  All About VLAD , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Xiaohui Xie,et al.  AnatomyNet: Deep learning for fast and fully automated whole‐volume segmentation of head and neck anatomy , 2018, Medical physics.

[41]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[42]  Andrew V. Martin,et al.  Unsupervised classification of single-particle X-ray diffraction snapshots by spectral clustering. , 2011, Optics express.

[43]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[44]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.