Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization

In E-commerce, it is a common practice to organize the product catalog using product taxonomy. This enables the buyer to easily locate the item they are looking for and also to explore various items available under a category. Product taxonomy is a tree structure with 3 or more levels of depth and several leaf nodes. Product categorization is a large scale classification task that assigns a category path to a particular product. Research in this area is restricted by the unavailability of good real-world datasets and the variations in taxonomy due to the absence of a standard across the different e-commerce stores. In this paper, we introduce a high-quality product taxonomy dataset focusing on clothing products which contain 186,150 images under clothing category with 3 levels and 52 leaf nodes in the taxonomy. We explain the methodology used to collect and label this dataset. Further, we establish the benchmark by comparing image classification and Attention based Sequence models for predicting the category path. Our benchmark model reaches a micro f-score of 0.92 on the test set. The dataset, code and pre-trained models are publicly available at \url{this https URL}. We invite the community to improve upon these baselines.

[1]  Stanley Kok,et al.  Unconstrained Production Categorization with Sequence-to-Sequence Models , 2019, eCOM@SIGIR.

[2]  Zornitsa Kozareva,et al.  Everyone Likes Shopping! Multi-class Product Categorization for e-Commerce , 2015, NAACL.

[3]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[4]  Dan Shen,et al.  Large-scale item categorization for e-commerce , 2012, CIKM.

[5]  Jeonghee Kim,et al.  Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks , 2016, KDD.

[6]  Yiu-Chang Lin,et al.  Overview of the SIGIR 2018 eCom Rakuten Data Challenge , 2018, eCOM@SIGIR.

[7]  Kei Wakabayashi,et al.  Encoder-Decoder Neural Networks for Taxonomy Classification , 2018, eCOM@SIGIR.

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[10]  Luc Van Gool,et al.  Apparel Classification with Style , 2012, ACCV.

[11]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[12]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[13]  Aaron Levine,et al.  Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce , 2017, EACL.

[14]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[16]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[17]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Michel C. A. Klein,et al.  GoldenBullet: Automated Classification of Product Data in E-commerce , 2002 .

[21]  Stanley Kok,et al.  Don't Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation , 2018, ArXiv.

[22]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.