Don't Classify, Translate: Multi-Level E-Commerce Product Categorization Via Machine Translation

E-commerce platforms categorize their products into a multi-level taxonomy tree with thousands of leaf categories. Conventional methods for product categorization are typically based on machine learning classification algorithms. These algorithms take product information as input (e.g., titles and descriptions) to classify a product into a leaf category. In this paper, we propose a new paradigm based on machine translation. In our approach, we translate a product's natural language description into a sequence of tokens representing a root-to-leaf path in a product taxonomy. In our experiments on two large real-world datasets, we show that our approach achieves better predictive accuracy than a state-of-the-art classification system for product categorization. In addition, we demonstrate that our machine translation models can propose meaningful new paths between previously unconnected nodes in a taxonomy tree, thereby transforming the taxonomy into a directed acyclic graph (DAG). We discuss how the resultant taxonomy DAG promotes user-friendly navigation, and how it is more adaptable to new products.

[1]  Yiu-Chang Lin,et al.  Overview of the SIGIR 2018 eCom Rakuten Data Challenge , 2018, eCOM@SIGIR.

[2]  Lijun Wu,et al.  Achieving Human Parity on Automatic Chinese to English News Translation , 2018, ArXiv.

[3]  Jure Leskovec,et al.  Inferring Networks of Substitutable and Complementary Products , 2015, KDD.

[4]  Philipp Koehn,et al.  Neural Machine Translation , 2017, ArXiv.

[5]  Ali Cevahir,et al.  Large-scale Multi-class and Hierarchical Product Categorization for an E-commerce Giant , 2016, COLING.

[6]  Aaron Levine,et al.  Large-scale taxonomy categorization for noisy product listings , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[7]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8]  Chia-Hua Ho,et al.  Product Title Classification versus Text Classification , 2012 .

[9]  Patrick Shafto,et al.  Development of categorization and reasoning in the natural world: novices to experts, naive similarity to ecological knowledge. , 2003, Journal of experimental psychology. Learning, memory, and cognition.

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[15]  J. Avery,et al.  The long tail. , 1995, Journal of the Tennessee Medical Association.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  AnHai Doan,et al.  Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing , 2014, Proc. VLDB Endow..

[18]  Zornitsa Kozareva,et al.  Everyone Likes Shopping! Multi-class Product Categorization for e-Commerce , 2015, NAACL.

[19]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20]  Joshua B. Tenenbaum,et al.  Context-Sensitive Induction , 2005 .

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Jeonghee Kim,et al.  Large-Scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks , 2016, KDD.

[25]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[26]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.

[27]  Aaron Levine,et al.  Large-Scale Categorization of Japanese Product Titles Using Neural Attention Models , 2017, EACL.

[28]  Brian H. Ross,et al.  Food for Thought: Cross-Classification and Category Organization in a Complex Real-World Domain , 1999, Cognitive Psychology.

[29]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[30]  Dan Shen,et al.  Large-scale item categorization for e-commerce , 2012, CIKM.

[31]  Jianfu Chen,et al.  Cost-sensitive learning for large-scale hierarchical classification , 2013, CIKM.

[32]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.

[33]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[34]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[35]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[36]  E. Heit,et al.  Similarity and property effects in inductive reasoning. , 1994, Journal of experimental psychology. Learning, memory, and cognition.

[37]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[38]  Julian J. McAuley,et al.  Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.