Intermediate Training of BERT for Product Matching

Transformer-based models like BERT have pushed the state-of the-art for a wide range of tasks in natural language processing. General-purpose pre-training on large corpora allows Transformers to yield good performance even with small amounts of training data for task-specific fine-tuning. In this work, we apply BERT to the task of product matching in e-commerce and show that BERT is much more training data efficient than other state-of-the-art methods. Moreover, we show that we can further boost its effectiveness through an intermediate training step, exploiting large collections of product offers. Our intermediate training leads to strong performance (>90% F1) on new, unseen products without any product-specific fine-tuning. Further fine-tuning yields additional gains, resulting in improvements of up to 12% F1 for small training sets. Adding the masked language modeling objective in the intermediate training step in order to further adapt the language model to the application domain leads to an additional increase of up to 3% F1.

[1]  Luciano Barbosa Learning representations of Web entities for entity resolution , 2019, Int. J. Web Inf. Syst..

[2]  AnHai Doan,et al.  Technical Perspective:: Toward Building Entity Matching Management Systems , 2016, SGMD.

[3]  Samuel R. Bowman,et al.  Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks , 2018, ArXiv.

[4]  Christian Bizer,et al.  Using schema.org Annotations for Training and Maintaining Product Matchers , 2020, WIMS.

[5]  Graham Neubig,et al.  XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization , 2020, ICML.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Shafiq R. Joty,et al.  Distributed Representations of Tuples for Entity Resolution , 2018, Proc. VLDB Endow..

[8]  Christopher D. Manning,et al.  A Structural Probe for Finding Syntax in Word Representations , 2019, NAACL.

[9]  Chris Brockett,et al.  Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[10]  Jun Ma,et al.  AutoKnow: Self-Driving Knowledge Collection for Products of Thousands of Types , 2020, KDD.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Jaewoo Kang,et al.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining , 2019, Bioinform..

[13]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[14]  Thomas Wolf,et al.  DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter , 2019, ArXiv.

[15]  Xianpei Han,et al.  End-to-End Multi-Perspective Matching for Entity Resolution , 2019, IJCAI.

[16]  Da Xu,et al.  Product Knowledge Graph Embedding for E-commerce , 2019, WSDM.

[17]  Jungo Kasai,et al.  Low-resource Deep Entity Resolution with Transfer and Active Learning , 2019, ACL.

[18]  Christian Bizer,et al.  The WDC Training Dataset and Gold Standard for Large-Scale Product Matching , 2019, WWW.

[19]  Theodoros Rekatsinas,et al.  Deep Learning for Entity Matching: A Design Space Exploration , 2018, SIGMOD Conference.

[20]  Kian-Lee Tan,et al.  Multi-Context Attention for Entity Matching , 2020, WWW.

[21]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[22]  AnHai Doan,et al.  Deep entity matching with pre-trained language models , 2020, VLDB 2020.

[23]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[24]  Andreas Thor,et al.  Evaluation of entity resolution approaches on real-world match problems , 2010, Proc. VLDB Endow..

[25]  Samuel R. Bowman,et al.  Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work? , 2020, ACL.

[26]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[27]  Ursin Brunner,et al.  Entity Matching with Transformer Architectures - A Step Forward in Data Integration , 2020, EDBT.

[28]  Vasilis Efthymiou,et al.  End-to-End Entity Resolution for Big Data: A Survey , 2019, ArXiv.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[32]  Peter Christen,et al.  Data Matching , 2012, Data-Centric Systems and Applications.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.