A Deep Forest Method for Classifying E-Commerce Products by Using Title Information

E-commerce platforms, such as Amazon, eBay and Tmall, are flooded with various types of products. These platforms need to classify the products to facilitate product management and recommendation, which however can be very costly by using manual work. Recently, ML-based classification technology, e.g. SVM and DL, has been widely used in industry to classify e-commerce products by using the text information in the titles given by the merchants. However, current techniques can be inefficient and inaccurate when the number of categories is large and the data scale is small, as in the e-commerce product classification problem. In this paper, we propose a novel machine learning method for the problem, referred to as gcForest, which utilizes the cascade forest of decision trees and multi-grained scanning mechanisms. After preprocessing the product title information by using a word examination technology, the TF-IDF algorithm, we carry out a serials of experiments with 4000 samples belonging to 35 categories of products. The experiment results show that the classification accuracy using gcForest is 92.38%, which outperforms SVM with RBF kernel (86.88%), SVM with linear kernel (89.73%) and CNN (86.86%).

[1]  Mark Heitmann,et al.  Comparing automated text classification methods , 2019, International Journal of Research in Marketing.

[2]  Ji Feng,et al.  Deep forest , 2017, IJCAI.

[3]  Kenli Li,et al.  A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  Luísa Coheur,et al.  From symbolic to sub-symbolic information in question classification , 2011, Artificial Intelligence Review.

[5]  Gernot A. Fink,et al.  PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents , 2016, 2016 15th International Conference on Frontiers in Handwriting Recognition (ICFHR).

[6]  Weilin Huang,et al.  Text-Attentional Convolutional Neural Network for Scene Text Detection , 2015, IEEE Transactions on Image Processing.

[7]  Derwin Suhartono,et al.  Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF) , 2016 .

[8]  Duc-Thuan Vo,et al.  Learning to classify short text from scientific documents using topic models with various types of knowledge , 2015, Expert Syst. Appl..

[9]  Zornitsa Kozareva,et al.  Everyone Likes Shopping! Multi-class Product Categorization for e-Commerce , 2015, NAACL.

[10]  Hao Luo,et al.  Entropy-based spammer detection , 2018, ICIMCS '18.

[11]  D. Thenmozhi,et al.  Machine Learning Approach to Document Classification using Concept based Features , 2015 .

[12]  Ammar Ismael Kadhim Survey on supervised machine learning techniques for automatic text classification , 2019, Artificial Intelligence Review.

[13]  Li Han,et al.  A Clothes Classification Method Based on the gcForest , 2018, 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC).