Improving Fashion Landmark Detection by Dual Attention Feature Enhancement

Fashion landmark detection is a fundamental problem in visual fashion analyze, which aims at locating the precise coordinates of functional key points defined on clothes. Dozens of deep learning-based methods are proposed to address this problem. How to extract adequate and effective features is a critical point for this challenging task. In this paper, we propose the Dual Attention Feature Enhancement(DAFE) module, which strengthens the extracted features by adaptively reusing low-level image details and emphasizing informative parts. First, DAFE enhances the pixel-wise information through capturing the spatial details from low-level features by the guidance of attention matrix, which is generated from high-level ones. Second, DAFE emphasizes task-related features by modeling long-range relationships between channels. Experimental experiments on Deepfashion and FLD datasets demonstrate that our method achieves state-of-the-art performance, and our approach also achieves competitive results on Deepfashion2 Landmark Estimation Challenge.

[1]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Larry S. Davis,et al.  Automatic Spatially-Aware Fashion Concept Discovery , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Xiaogang Wang,et al.  Fashion Landmark Detection in the Wild , 2016, ECCV.

[4]  Song-Chun Zhu,et al.  Attentive Fashion Grammar Network for Fashion Landmark Detection and Clothing Category Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[7]  Xiaogang Wang,et al.  Unconstrained Fashion Landmark Detection via Hierarchical Recurrent Transformer Networks , 2017, ACM Multimedia.

[8]  Ruimao Zhang,et al.  DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stephen Lin,et al.  GCNet: Non-Local Networks Meet Squeeze-Excitation Networks and Beyond , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[10]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Wei Xu,et al.  Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Yu-Gang Jiang,et al.  Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[13]  Kristen Grauman,et al.  Creating Capsule Wardrobes from Fashion Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[15]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.