Dataset Construction via Attention for Aspect Term Extraction with Distant Supervision

Aspect Term Extraction (ATE) detects opinionated aspect terms in sentences or text spans, with the end goal of performing aspect-based sentiment analysis. The small amount of available datasets for supervised ATE and the fact that they cover only a few domains raise the need for exploiting other data sources in new and creative ways. Publicly available review corpora contain a plethora of opinionated aspect terms and cover a larger domain spectrum. In this paper, we first propose a method for using such review corpora for creating a new dataset for ATE. Our method relies on an attention mechanism to select sentences that have a high likelihood of containing actual opinionated aspects. We thus improve the quality of the extracted aspects. We then use the constructed dataset to train a model and perform ATE with distant supervision. By evaluating on human annotated datasets, we prove that our method achieves a significantly improved performance over various unsupervised and supervised baselines. Finally, we prove that sentence selection matters when it comes to creating new datasets for ATE. Specifically, we show that, using a set of selected sentences leads to higher ATE performance compared to using the whole sentence set.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Karl Stratos,et al.  Simple Semi-Supervised POS Tagging , 2015, VS@HLT-NAACL.

[3]  Claudiu Musat,et al.  Unsupervised Aspect Term Extraction with B-LSTM & CRF using Automatically Labelled Datasets , 2017, WASSA@EMNLP.

[4]  Aitor García Pablos,et al.  V3: Unsupervised Aspect Based Sentiment Analysis for SemEval2015 Task 12 , 2015, *SEMEVAL.

[5]  Kazutaka Shimada,et al.  Aspect Identification of Sentiment Sentences Using A Clustering Algorithm , 2011 .

[6]  Zhiqiang Toh,et al.  DLIREC: Aspect Term Extraction and Term Polarity Classification System , 2014, *SEMEVAL.

[7]  Suresh Manandhar,et al.  SemEval-2014 Task 4: Aspect Based Sentiment Analysis , 2014, *SEMEVAL.

[8]  Björn W. Schuller,et al.  SenticNet 4: A Semantic Resource for Sentiment Analysis Based on Conceptual Primitives , 2016, COLING.

[9]  Kentaro Torisawa,et al.  Inducing Gazetteers for Named Entity Recognition by Large-Scale Clustering of Dependency Relations , 2008, ACL.

[10]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[11]  Matteo Pagliardini,et al.  Unsupervised Learning of Sentence Embeddings Using Compositional n-Gram Features , 2017, NAACL.

[12]  Josef Steinberger,et al.  Unsupervised Methods to Improve Aspect-Based Sentiment Analysis in Czech , 2016, Computación y Sistemas.

[13]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[14]  Maksim Tkatchenko,et al.  Named entity recognition: Exploring features , 2012, KONVENS.

[15]  Haizhou Li,et al.  Graph-based informative-sentence selection for opinion summarization , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[16]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[17]  Erik Cambria,et al.  Aspect extraction for opinion mining with a deep convolutional neural network , 2016, Knowl. Based Syst..

[18]  Aitor García Pablos,et al.  Unsupervised acquisition of domain aspect terms for Aspect Based Opinion Mining , 2014, Proces. del Leng. Natural.

[19]  Ming Zhou,et al.  Unsupervised Word and Dependency Path Embeddings for Aspect Term Extraction , 2016, IJCAI.

[20]  Kazutaka Shimada,et al.  Multi-aspects review summarization with objective information , 2011 .

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Qian Liu,et al.  Automated rule selection for opinion target extraction , 2016, Knowl. Based Syst..

[23]  Jian Su,et al.  NLANGP: Supervised Machine Learning System for Aspect Category Classification and Opinion Target Extraction , 2015, *SEMEVAL.

[24]  Christos Faloutsos,et al.  Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining , 2013, ASONAM 2013.

[25]  Maryna Chernyshevich,et al.  IHS R&D Belarus: Cross-domain extraction of product features using CRF , 2014, *SEMEVAL.

[26]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[27]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[28]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.