Towards More Robust Fashion Recognition by Combining of Deep-Learning-Based Detection with Semantic Reasoning

The company FutureTV produces and distributes self-produced videos in the fashion domain. It creates revenue through the placement of relevant advertising. The placement of apposite ads, though, requires an understanding of the contents of the videos. Until now, this tagging is created manually in a labor-intensive process. We believe that image recognition technologies can significantly decrease the need for manual involvement in the tagging process. However, the tagging of videos comes with additional challenges: Preliminary, new deep-learning models need to be trained on vast amounts of data obtained in a labor-intensive data-collection process. We suggest a new approach for the combining of deep-learning-based recognition with a semantic reasoning engine. Through the explicit declaration of knowledge fitting to the fashion categories present in the training data of the recognition system, we argue that it is possible to refine the recognition results and win extra knowledge beyond what is found in the neural net.

[1]  Ying Zhang,et al.  FashionBrain Project: A Vision for Understanding Europe's Fashion Data Universe , 2017, ArXiv.

[2]  Liang Lin,et al.  Clothing Co-parsing by Joint Image Segmentation and Labeling , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jianping Fan,et al.  Embedding Visual Hierarchy With Deep Networks for Large-Scale Visual Recognition , 2017, IEEE Transactions on Image Processing.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Matthijs Douze,et al.  Fixing the train-test resolution discrepancy , 2019, NeurIPS.

[7]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Networks for Large Scale Visual Recognition , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Ruimao Zhang,et al.  DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[11]  Kurt Sandkuhl,et al.  Design Decisions and Their Implications: An Ontology Quality Perspective , 2020, BIR.

[12]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  ModaNet , 2018, Proceedings of the 26th ACM international conference on Multimedia.

[14]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Michael Bain,et al.  B-CNN: Branch Convolutional Neural Network for Hierarchical Classification , 2017, ArXiv.

[16]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Sandeepak Bhandari Ontology Based Image Recognition : A Review , 2018 .

[18]  Bolei Zhou,et al.  Places: A 10 Million Image Database for Scene Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Junfeng Wu,et al.  Review of the Application of Ontology in the Field of Image Object Recognition , 2019, ICCMS 2019.

[21]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[22]  Zhen Li,et al.  Blockout: Dynamic Model Selection for Hierarchical Deep Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Edward Szczerbicki,et al.  Video Semantic Analysis Framework based on Run-time Production Rules - Towards Cognitive Vision , 2015, J. Univers. Comput. Sci..

[24]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.