Adding Quaternion Representations to Attention Networks for Classification

This paper introduces a novel modification to axial-attention networks to improve their image classification accuracy. The modification involves supplementing axial-attention modules with quaternion input representations to improve image classification accuracy. We chose axial-attention networks because they factor 2D attention operations into two consecutive 1D operations (similar to separable convolution) and are thus less resource intensive than non-axial attention networks. We chose a quaternion encoder because of they share weights across four real-valued input channels and the weight-sharing has been shown to produce a more interlinked/interwoven output representation. We hypothesize that an attention module can be more effective using these interlinked representations as input. Our experiments support this hypothesis as reflected in the improved classification accuracy compared to standard axial-attention networks. We think this happens because the attention modules have better input representations to work with.

[1]  Stephen Lin,et al.  Local Relation Networks for Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Siu Cheung Hui,et al.  Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters , 2021, ICLR.

[3]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[4]  A. Yuille,et al.  Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[5]  Chase J Gaudet,et al.  Generalizing Complex/Hyper-complex Convolutions to Vector Map Convolutions , 2020, ArXiv.

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Anthony S. Maida,et al.  Deep Quaternion Networks , 2017, 2018 International Joint Conference on Neural Networks (IJCNN).

[8]  Titouan Parcollet,et al.  A survey of quaternion neural networks , 2019, Artificial Intelligence Review.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).