AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis

We present AWATIF, a multi-genre corpus of Modern Standard Arabic (MSA) labeled for subjectivity and sentiment analysis (SSA) at the sentence level. The corpus is labeled using both regular as well as crowd sourcing methods under three different conditions with two types of annotation guidelines. We describe the sub-corpora constituting the corpus and provide examples from the various SSA categories. In the process, we present our linguistically-motivated and genre-nuanced annotation guidelines and provide evidence showing their impact on the labeling task.

[1]  E. Goffman On face-work; an analysis of ritual elements in social interaction. , 1955, Psychiatry.

[2]  R. Groves Hong Kong: A Society in Transition , 1970 .

[3]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[4]  K. Bach,et al.  Linguistic Communication and Speech Acts , 1983 .

[5]  Ann Banfield,et al.  Unspeakable Sentences : Narration and Representation in the Language of Fiction , 1982 .

[6]  Penelope Brown,et al.  Politeness: Some Universals in Language Usage , 1989 .

[7]  J. Agassi,et al.  A Study in Westernization , 1987 .

[8]  Y. Matsumoto Politeness and conversational universals – observations from Japanese , 1989 .

[9]  Yueguo Gu Politeness phenomena in modern Chinese , 1990 .

[10]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[11]  M. González Politeness: some universals in language usage , 1995 .

[12]  Janyce Wiebe,et al.  Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[13]  K. Hengeveld Mood and modality , 2004 .

[14]  M. Maamouri,et al.  The Penn Arabic Treebank: Building a Large-Scale Annotated Arabic Corpus , 2004 .

[15]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[16]  Eric K. Ringger,et al.  Pulse: Mining Customer Opinions from Free Text , 2005, IDA.

[17]  Wei-Hao Lin,et al.  Which Side are You on? Identifying Perspectives at the Document and Sentence Levels , 2006, CoNLL.

[18]  Annie Zaenen,et al.  Contextual Valence Shifters , 2006, Computing Attitude and Affect in Text.

[19]  Kadri Hacioglu,et al.  Automatic Processing of Modern Standard Arabic Text , 2007 .

[20]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[21]  Bruno Pouliquen,et al.  Opinion Mining on Newspaper Quotations , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[22]  B. Alexandra,et al.  Rethinking Sentiment Analysis in the News: from Theory to Practice and back , 2009 .

[23]  Nizar Habash,et al.  MADA + TOKAN : A Toolkit for Arabic Tokenization , Diacritization , Morphological Disambiguation , POS Tagging , Stemming and Lemmatization , 2009 .

[24]  Yannick Versley,et al.  Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither , 2010, SPMRL@NAACL-HLT.

[25]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Annotation of Modern Standard Arabic Newswire , 2011, Linguistic Annotation Workshop.

[26]  Muhammad Abdul-Mageed,et al.  Subjectivity and Sentiment Analysis of Modern Standard Arabic , 2011, ACL.