A Multimodal Late Fusion Model for E-Commerce Product Classification

The cataloging of product listings is a fundamental problem for most e-commerce platforms. Despite promising results obtained by unimodal-based methods, it can be expected that their performance can be further boosted by the consideration of multimodal product information. In this study, we investigated a multimodal late fusion approach based on text and image modalities to categorize e-commerce products on Rakuten. Specifically, we developed modal specific state-of-the-art deep neural networks for each input modal, and then fused them at the decision level. Experimental results on Multimodal Product Classification Task of SIGIR 2020 E-Commerce Workshop Data Challenge demonstrate the superiority and effectiveness of our proposed method compared with unimodal and other multimodal methods. Our team named pa_curis won the 1st place with a macro-F1 of 0.9144 on the final leaderboard.

[1]  Shie Mannor,et al.  Is a picture worth a thousand words? A Deep Multi-Modal Fusion Architecture for Product Classification in e-commerce , 2016, AAAI 2016.

[2]  Xuanjing Huang,et al.  Pre-trained Models for Natural Language Processing: A Survey , 2020, ArXiv.

[3]  Ali Cevahir,et al.  Large-scale Multi-class and Hierarchical Product Categorization for an E-commerce Giant , 2016, COLING.

[4]  Aaron Levine,et al.  Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models , 2017, EACL.

[5]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[6]  Ignazio Gallo,et al.  Multimodal Classification Fusion in Real-World Scenarios , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[7]  Artit Wangperawong,et al.  Multi-Label Product Categorization Using Multi-Modal Fusion Models , 2019, ArXiv.

[8]  Isaac L. Chuang,et al.  Confident Learning: Estimating Uncertainty in Dataset Labels , 2019, J. Artif. Intell. Res..

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tao Mei,et al.  Destruction and Construction Learning for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Stanley Kok,et al.  Don't Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation , 2018, ArXiv.