Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models

We propose a variant of Convolutional Neural Network (CNN) models, the Attention CNN (ACNN); for large-scale categorization of millions of Japanese items into thirty-five product categories. Compared to a state-of-the-art Gradient Boosted Tree (GBT) classifier, the proposed model reduces training time from three weeks to three days while maintaining more than 96% accuracy. Additionally, our proposed model characterizes products by imputing attentive focus on word tokens in a language agnostic way. The attention words have been observed to be semantically highly correlated with the predicted categories and give us a choice of automatic feature extraction for downstream processing.

[1]  Dan Shen,et al.  Large-scale item categorization for e-commerce , 2012, CIKM.

[2]  Zornitsa Kozareva,et al.  Everyone Likes Shopping! Multi-class Product Categorization for e-Commerce , 2015, NAACL.

[3]  Jianfu Chen,et al.  Cost-sensitive learning for large-scale hierarchical classification , 2013, CIKM.

[4]  Neel Sundaresan,et al.  A Study of Smoothing Algorithms for Item Categorization on e-Commerce Sites , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[5]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Neel Sundaresan,et al.  Item categorization in the e-commerce domain , 2011, CIKM '11.

[8]  Jeonghee Kim,et al.  Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks , 2016, KDD.

[9]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[10]  Aaron Levine,et al.  Web-Scale Language-Independent Cataloging of Noisy Product Listings for E-Commerce , 2017, EACL.

[11]  Bowen Zhou,et al.  Dependency-based Convolutional Neural Networks for Sentence Embedding , 2015, ACL.

[12]  AnHai Doan,et al.  Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing , 2014, Proc. VLDB Endow..

[13]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[14]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[15]  Yoshua Bengio,et al.  Convolutional networks for images, speech, and time series , 1998 .

[16]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.