A Commodity Classification Framework Based on Machine Learning for Analysis of Trade Declaration

Text, voice, images and videos can express some intentions and facts in daily life. By understanding these contents, people can identify and analyze some behaviors. This paper focuses on the commodity trade declaration process and identifies the commodity categories based on text information on customs declarations. Although the technology of text recognition is mature in many application fields, there are few studies on the classification and recognition of customs declaration goods. In this paper, we proposed a classification framework based on machine learning (ML) models for commodity trade declaration that reaches a high rate of accuracy. This paper also proposed a symmetrical decision fusion method for this task based on convolutional neural network (CNN) and transformer. The experimental results show that the fusion model can make up for the shortcomings of the two original models and some improvements have been made. In the two datasets used in this paper, the accuracy can reach 88% and 99%, respectively. To promote the development of study of customs declaration business and Chinese text recognition, we also exposed the proprietary datasets used in this study.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Chun-Xia Zhang,et al.  Improving text classification with weighted word embeddings via a multi-channel TextCNN model , 2019, Neurocomputing.

[3]  Surender Reddy Salkuti,et al.  A survey of big data and machine learning , 2020 .

[4]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[5]  Iqbal H. Sarker,et al.  BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model , 2019, Mobile Networks and Applications.

[6]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[8]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[9]  Liguo Yao,et al.  Fine-Grained Mechanical Chinese Named Entity Recognition Based on ALBERT-AttBiLSTM-CRF and Transfer Learning , 2020, Symmetry.

[10]  Mouloud Koudil,et al.  A Novel Active Learning Method Using SVM for Text Classification , 2018, Int. J. Autom. Comput..

[11]  Keunho Choi,et al.  Development of a Natural Language Processing based Deep Learning Model for Automated HS Code Classification of the Imported Goods , 2021 .

[12]  Stanislaw Jastrzebski,et al.  Molecule Attention Transformer , 2020, ArXiv.

[13]  Philip S. Yu,et al.  A Text Classification Survey: From Shallow to Deep Learning , 2020 .

[14]  Inhwan Kim,et al.  Word2vec convolutional neural networks for classification of news articles and tweets , 2019, PloS one.

[15]  Liya Ding,et al.  Auto-Categorization of HS Code Using Background Net Approach , 2015, KES.

[16]  Yunming Ye,et al.  An Improved Random Forest Classifier for Text Categorization , 2012, J. Comput..

[17]  Keunho Choi,et al.  CNN-based Recommendation Model for Classifying HS Code , 2020 .

[18]  H. Haav,et al.  Application of Machine Learning for Assessment of HS Code Correctness , 2020, Balt. J. Mod. Comput..

[19]  Donald E. Brown,et al.  Text Classification Algorithms: A Survey , 2019, Inf..