Clothes detection and classification using convolutional neural networks

In this paper we describe development of a computer vision system for accurate detection and classification of clothes for e-commerce images. We present a set of experiments on well established architectures of convolutional neural networks, including Residual networks, SqueezeNet and Single Shot MultiBox Detector (SSD). The clothes detection network was trained and tested on DeepFashion dataset, which contains box annotations for locations of clothes. Classification task was evaluated on a set of images of dresses that were collected from online shops. Ground truth labels were inferred from shop items metadata for five different attributes, including color, pattern, sleeve, neckline and hemline, each consisting of several possible classes. Automatic gathering of labels resulted in an average of 83% rate of correct labels. In the experiments we evaluate the impact on classification accuracy of a set of potential improvements, including data augmentation by generating diverse backgrounds, increasing the size of the network and using ensembles. We analyse the accuracy improvements with respect to the processing efficiency. Finally, we present the achieved accuracy rates in the clothes detection task and outline the most successful network configurations for dresses classification.

[1]  Yeongjae Cheon,et al.  PVANet: Lightweight Deep Neural Networks for Real-time Object Detection , 2016, ArXiv.

[2]  Robinson Piramuthu,et al.  Style Finder: Fine-Grained Clothing Style Detection and Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Basela Hasan,et al.  Segmentation using Deformable Spatial Priors with Application to Clothing , 2010, BMVC.

[6]  Chu-Song Chen,et al.  MVC: A Dataset for View-Invariant Clothing Retrieval and Attribute Prediction , 2016, ICMR.

[7]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jian Dong,et al.  Deep domain adaptation for describing people based on fine-grained clothing attributes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ming Yang,et al.  Real-time clothing recognition in surveillance videos , 2011, 2011 18th IEEE International Conference on Image Processing.

[10]  Xiaogang Wang,et al.  End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[13]  Hanqing Lu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Richard L. Grimsdale,et al.  Computer graphics techniques for modeling cloth , 1996, IEEE Computer Graphics and Applications.

[15]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[16]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[17]  Nan Wang,et al.  Who Blocks Who: Simultaneous clothing segmentation for grouping images , 2011, 2011 International Conference on Computer Vision.

[18]  S. Tsogkas,et al.  Deep Learning for Semantic Part Segmentation with High-Level Guidance , 2015 .

[19]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20]  Xiaogang Wang,et al.  Learning from massive noisy labeled data for image classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[22]  Hong Chen,et al.  Composite Templates for Cloth Modeling and Sketching , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Kevin Lin,et al.  Rapid Clothing Retrieval via Deep Learning of Binary Codes and Hierarchical Search , 2015, ICMR.

[25]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[26]  Ju-Chin Chen,et al.  Deep net architectures for visual-based clothing image recognition on large database , 2017, Soft Computing.

[27]  Yujie Liu,et al.  Cross-scenario clothing retrieval and fine-grained style recognition , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[28]  Xiao Wu,et al.  Part-based clothing image annotation by visual neighbor retrieval , 2016, Neurocomputing.

[29]  Shaogang Gong,et al.  Multi-task Curriculum Transfer Deep Learning of Clothing Attributes , 2016, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[30]  Luc Van Gool,et al.  Apparel Classification with Style , 2012, ACCV.

[31]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[32]  Niall McLaughlin,et al.  Data-augmentation for reducing dataset bias in person re-identification , 2015, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).