Matching unstructured product offers to structured product specifications

An e-commerce catalog typically comprises of specifications for millions of products. The search engine receives millions of sales offers from thousands of independent merchants that must be matched to the right products. We describe the challenges that a system for matching unstructured offers to structured product descriptions must address, drawing upon our experience from building such a system for Bing Shopping. The heart of our system is a data-driven component that learns the matching function off-line, which is then applied at run-time for matching offers to products. We provide the design of this and other critical components of the system as well as the details of the extensive experiments we performed to assess the readiness of the system. This system is currently deployed in an experimental Commerce Search Engine and is used to match all the offers received by Bing Shopping to the Bing product catalog.

[1]  Andrew McCallum,et al.  Bi-directional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs , 2009, ECML/PKDD.

[2]  Ravi Kumar,et al.  Matching Reviews to Objects using a Language Model , 2009, EMNLP.

[3]  Salvatore J. Stolfo,et al.  The merge/purge problem for large databases , 1995, SIGMOD '95.

[4]  William W. Cohen Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[5]  P. Ivax,et al.  A THEORY FOR RECORD LINKAGE , 2004 .

[6]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Mukesh K. Mohania,et al.  Efficiently linking text documents with relevant structured information , 2006, VLDB.

[8]  Craig A. Knoblock,et al.  Creating Relational Data from Unstructured and Ungrammatical Data Sources , 2008, J. Artif. Intell. Res..

[9]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[10]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[11]  Jennifer Widom,et al.  Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.

[12]  Christos Faloutsos,et al.  Auditing Compliance with a Hippocratic Database , 2004, VLDB.

[13]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[14]  Edith Bolling Anaphora Resolution , 2006 .

[15]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[16]  W. Winkler Overview of Record Linkage and Current Research Directions , 2006 .

[17]  Pradeep Ravikumar,et al.  A Hierarchical Graphical Model for Record Linkage , 2004, UAI.

[18]  K. Pu,et al.  Keyword query cleaning , 2008, Proc. VLDB Endow..

[19]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[20]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[21]  N. S. D'Andrea Du Bois,et al.  A Solution to the Problem of Linking Multivariate Documents , 1969 .

[22]  Charles Elkan,et al.  The Field Matching Problem: Algorithms and Applications , 1996, KDD.

[23]  Sugato Basu,et al.  Adaptive product normalization: using online learning for record linkage in comparison shopping , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[24]  Dennis Shasha,et al.  Efficient data reconciliation , 2001, Inf. Sci..

[25]  Panayiotis Tsaparas,et al.  Structured annotations of web queries , 2010, SIGMOD Conference.