The USAGE review corpus for fine grained multi lingual opinion analysis

Opinion mining has received wide attention in recent years. Models for this task are typically trained or evaluated with a manually annotated dataset. However, fine-grained annotation of sentiments including information about aspects and their evaluation is very labour-intensive. The data available so far is limited. Contributing to this situation, this paper describes the Bielefeld University Sentiment Analysis Corpus for German and English (USAGE), which we offer freely to the community and which contains the annotation of product reviews from Amazon with both aspects and subjective phrases. It provides information on segments in the text which denote an aspect or a subjective evaluative phrase which refers to the aspect. Relations and coreferences are explicitly annotated. This dataset contains 622 English and 611 German reviews, allowing to investigate how to port sentiment analysis systems across languages and domains. We describe the methodology how the corpus was created and provide statistics including inter-annotator agreement. We further provide figures for a baseline system and results for German and English as well as in a cross-domain setting. The results are encouraging in that they show that aspects and phrases can be extracted robustly without the need of tuning to a particular type of products.

[1]  Oscar Täckström,et al.  Semi-supervised latent variable models for sentence-level sentiment analysis , 2011, ACL.

[2]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[3]  Philipp Cimiano,et al.  Bi-directional Inter-dependencies of Subjective Expressions and Targets and their Value for a Joint Model , 2013, ACL.

[4]  Xiaoyan Zhu,et al.  Sentiment Analysis with Global Topics and Local Dependency , 2010, AAAI.

[5]  Simon Clematide,et al.  MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis , 2012, LREC.

[6]  Katarina Boland,et al.  Creating an Annotated Corpus for Sentiment Analysis of German Product Reviews , 2013 .

[7]  Ivan Titov,et al.  A Joint Model of Text and Aspect Ratings for Sentiment Summarization , 2008, ACL.

[8]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[9]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[10]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[11]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.

[12]  Amélie Marian,et al.  Beyond the Stars: Improving Rating Predictions using Review Text Content , 2009, WebDB.

[13]  Swapna Somasundaran,et al.  Finding the Sources and Targets of Subjective Expressions , 2008, LREC.

[14]  Philipp Cimiano,et al.  Joint and Pipeline Probabilistic Models for Fine-Grained Sentiment Analysis: Extracting Aspects, Subjective Phrases and their Relations , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[15]  William W. Cohen,et al.  Semi-Markov Conditional Random Fields for Information Extraction , 2004, NIPS.

[16]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[17]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  Iryna Gurevych,et al.  Sentence and Expression Level Annotation of Opinions in User-Generated Discourse , 2010, ACL.

[20]  Himabindu Lakkaraju,et al.  Exploiting Coherence for the Simultaneous Discovery of Latent Facets and associated Sentiments , 2011, SDM.

[21]  Jordan L. Boyd-Graber,et al.  Grammatical structures for word-level sentiment detection , 2012, NAACL.

[22]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[23]  Maarten de Rijke,et al.  Identifying entity aspects in microblog posts , 2012, SIGIR '12.

[24]  Janyce Wiebe,et al.  Learning Subjective Adjectives from Corpora , 2000, AAAI/IAAI.

[25]  Xiaojun Li,et al.  A sentiment analysis model for hotel reviews based on supervised learning , 2011, 2011 International Conference on Machine Learning and Cybernetics.