A Pre-trained Matching Model Based on Self- and Inter-ensemble For Product Matching Task

The product matching task aims to identify that if a pair of product deriving from different websites refer to the same product or not. While the accumulated semantic annotations of products make it possible to study deep neural network-based matching methods, product matching is still a challenging task due to suffering from the class imbalance and heterogeneity of textual descriptions. In this paper, we directly regard product matching as a semantic text matching problem and propose a pre-trained matching model based on both selfand inter-ensemble. BERT is the main module in our approach for binary classification of product pairs. We perform two types of ensemble methods: self-ensemble using stochastic weight averaging (SWA) for the same model, and inter-ensemble combing the prediction of different models. Additionally, the focal loss is adopted to alleviate the imbalance problem of positive and negative samples. Experimental results show that our model outperforms existing deep learning matching approaches. The proposed model achieves an F1-score of 85.94% on the test data which ranks second in the SWC2020 on Mining the Web of HTML-embedded Product Data Task One. Our implementation has been released .

[1]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[2]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[3]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[4]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[5]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[6]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[7]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[8]  Jin-Hyuk Hong,et al.  Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information , 2018, AAAI.

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jonas Mueller,et al.  Siamese Recurrent Architectures for Learning Sentence Similarity , 2016, AAAI.

[12]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[13]  Zhiguo Wang,et al.  Bilateral Multi-Perspective Matching for Natural Language Sentences , 2017, IJCAI.

[14]  Yang Xiang,et al.  Enhanced attentive convolutional neural networks for sentence pair modeling , 2020, Expert Syst. Appl..

[15]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.