MAVE: A Product Dataset for Multi-source Attribute Value Extraction

Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications. In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attributevalue annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover. Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is an effective dataset for attribute value extraction task. It is also a very challenging task on zero-shot attribute extraction. Data is available at https://github.com/google-research-datasets/MAVE.

[1]  Li Yang,et al.  Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach , 2020, KDD.

[2]  Omer Levy,et al.  Zero-Shot Relation Extraction via Reading Comprehension , 2017, CoNLL.

[3]  Sourangshu Bhattacharya,et al.  Learning Cross-Task Attribute - Attribute Similarity for Multi-task Attribute-Value Extraction , 2021, ECNLP.

[4]  Rajeev Rastogi,et al.  Matching product titles using web-based enrichment , 2012, CIKM.

[5]  Daniel S. Weld,et al.  Fine-Grained Entity Recognition , 2012, AAAI.

[6]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[7]  Ashish Kulkarni,et al.  ProductQnA: Answering User Questions on E-Commerce Product Pages , 2019, WWW.

[8]  Flavius Frasincar,et al.  Faceted product search powered by the Semantic Web , 2012, Decis. Support Syst..

[9]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[10]  Abdi-Hakin A Dirie,et al.  Extracting diverse attribute-value information from product catalog text via transfer learning , 2017 .

[11]  Divesh Srivastava,et al.  DEXTER: Large-Scale Discovery and Extraction of Product Specifications on the Web , 2015, Proc. VLDB Endow..

[12]  Rongmei Lin,et al.  PAM: Understanding Product Images in Cross Product Category Attribute Extraction , 2021, KDD.

[13]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[14]  Christian Bizer,et al.  Learning Regular Expressions for the Extraction of Product Attributes from E-commerce Microdata , 2014, LD4IE@ISWC.

[15]  Shangsong Liang,et al.  Semi-supervisedly Co-embedding Attributed Networks , 2019, NeurIPS.

[16]  Yan Liang,et al.  AdaTag: Multi-Attribute Value Extraction from Product Profiles with Adaptive Decoding , 2021, ACL.

[17]  Marc Moens,et al.  Named Entity Recognition without Gazetteers , 1999, EACL.

[18]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[19]  Xin Luna Dong,et al.  TXtract: Taxonomy-Aware Knowledge Extraction for Thousands of Product Categories , 2020, ACL.

[20]  Yanghua Xiao,et al.  Knowledge-guided Open Attribute Value Extraction with Reinforcement Learning , 2020, EMNLP.

[21]  Saul Simhon,et al.  Enhancement and Analysis of TARS Few-shot Learning Model for Product Attribute Extraction from Unstructured Text , 2021 .

[22]  Nanyun Peng,et al.  Cross-Sentence N-ary Relation Extraction with Graph LSTMs , 2017, TACL.

[23]  Gerhard Weikum,et al.  Combining linguistic and statistical analysis to extract relations from web documents , 2006, KDD '06.

[24]  Laura Alonso Alemany,et al.  Accurate Product Attribute Extraction on the Field , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).

[25]  Dominic Widdows,et al.  Scalable Attribute-Value Extraction from Semi-structured Text , 2009, 2009 IEEE International Conference on Data Mining Workshops.

[26]  Jianmo Ni,et al.  Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects , 2019, EMNLP.

[27]  Ajinkya More,et al.  Attribute Extraction from Product Titles in eCommerce , 2016, ArXiv.

[28]  Jun Zhao,et al.  Relation Classification via Convolutional Deep Neural Network , 2014, COLING.

[29]  Song Xu,et al.  K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce , 2021, EMNLP.

[30]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Chandan K. Reddy,et al.  Language-Agnostic Representation Learning for Product Search on E-Commerce Platforms , 2020, WSDM.

[32]  Hady W. Lauw,et al.  Explainable Recommendation with Comparative Constraints on Product Aspects , 2021, WSDM.

[33]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[34]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[35]  David Carmel,et al.  Product Question Answering Using Customer Generated Content - Research Challenges , 2018, SIGIR.

[36]  Xipeng Qiu,et al.  A Unified Generative Framework for Various NER Subtasks , 2021, ACL.

[37]  Dongyan Zhao,et al.  Product-Aware Answer Generation in E-Commerce Question-Answering , 2019, WSDM.

[38]  Christian Bizer,et al.  Extracting attribute-value pairs from product specifications on the web , 2017, WI.

[39]  Feifei Li,et al.  OpenTag: Open Attribute Value Extraction from Product Profiles , 2018, KDD.

[40]  Frederick Reiss,et al.  Domain Adaptation of Rule-Based Annotators for Named-Entity Recognition Tasks , 2010, EMNLP.

[41]  Andrew McCallum,et al.  Modeling Relations and Their Mentions without Labeled Text , 2010, ECML/PKDD.

[42]  Rayid Ghani,et al.  Semi-Supervised Learning of Attribute-Value Pairs from Product Descriptions , 2007, IJCAI.

[43]  Jie Zhao,et al.  Riker: Mining Rich Keyword Representations for Interpretable Product Question Answering , 2019, KDD.

[44]  Qi Li,et al.  Recognizing Salient Entities in Shopping Queries , 2016, ACL.

[45]  Elinda Kajo Mece,et al.  Question Answering Systems: A Review on Present Developments, Challenges and Trends , 2017 .

[46]  Christos Mousas,et al.  DenserNet: Weakly Supervised Visual Localization Using Multi-scale Feature Aggregation , 2021, AAAI.

[47]  Sameer Singh,et al.  Multimodal Attribute Extraction , 2017, AKBC@NIPS.

[48]  Hanghang Tong,et al.  Unsupervised Attributed Network Embedding via Cross Fusion , 2021, WSDM.

[49]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[50]  Qifan Wang,et al.  Constructing a Comprehensive Events Database from the Web , 2019, CIKM.

[51]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[52]  Li Yang,et al.  ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.

[53]  W. Bruce Croft,et al.  Learning a Hierarchical Embedding Model for Personalized Product Search , 2017, SIGIR.

[54]  Pawan Goyal,et al.  Attribute Value Generation from Product Title using Language Models , 2021, ECNLP.

[55]  Lidan Shou,et al.  EXACT: Attributed Entity Extraction By Annotating Texts , 2019, SIGIR.

[56]  A. Nugaliyadde,et al.  Advances in Natural Language Question Answering: A Review , 2019, ArXiv.

[57]  Yue Wang,et al.  Multimodal Joint Attribute Prediction and Value Extraction for E-commerce Product , 2020, EMNLP.

[58]  Meng Liu,et al.  LARA: Attribute-to-feature Adversarial Learning for New-item Recommendation , 2020, WSDM.

[59]  Satoshi Sekine,et al.  Unsupervised Extraction of Attributes and Their Values from Product Description , 2013, IJCNLP.

[60]  Rayid Ghani,et al.  Text mining for product attribute extraction , 2006, SKDD.

[61]  Junling Hu,et al.  Bootstrapped Named Entity Recognition for Product Attribute Extraction , 2011, EMNLP.

[62]  Xinyu Jiang,et al.  Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title , 2019, ACL.