UIT-VSFC: Vietnamese Students’ Feedback Corpus for Sentiment Analysis

Students’ feedback is a vital resource for the interdisciplinary research combining of two fields: sentiment analysis and education. To strengthen the sentiment analysis of the Vietnamese language which is a low-resource language, we build a Vietnamese Students’ Feedback Corpus (UIT-VSFC), a free and high-quality corpus for research on two different tasks: sentiment-based and topic-based classifications. In this paper, we present the methods of building annotation guidelines and ensure the annotation accuracy and consistency of this corpus. The resource consists of over 16,000 sentences which are human-annotated on the two tasks. To assess the quality of our corpus, we measure the inter-annotator agreements and classification accuracies on our UIT-VSFC. As a result, we achieved 91.20% of the inter-annotator agreement for the sentiment-based task and 71.07% of that for the topic-based task. In addition, the best results are of baseline model as the Maximum Entropy classifier with 87.94% and 84.03% of the overall F1-score of the sentiment-based and topic-based tasks respectively. These results illustrate that the corpus is reliable and helpful resource for research.

[1]  Ngo Xuan Bach,et al.  An empirical study on sentiment analysis for Vietnamese , 2014, 2014 International Conference on Advanced Technologies for Communications (ATC 2014).

[2]  J. M. Martin,et al.  SentBuk: Sentiment analysis for e-learning environments , 2012, 2012 International Symposium on Computers in Education (SIIE).

[3]  Andrea Esuli,et al.  SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining , 2010, LREC.

[4]  Anupam Basu,et al.  An Agreement Measure for Determining Inter-Annotator Reliability of Human Judgements on Affective Text , 2008, Proceedings of the Workshop on Human Judgements in Computational Linguistics - HumanJudge '08.

[5]  Preslav Nakov,et al.  SemEval-2015 Task 10: Sentiment Analysis in Twitter , 2015, *SEMEVAL.

[6]  Son Bao Pham,et al.  Sentiment Analysis for Vietnamese , 2010, 2010 Second International Conference on Knowledge and Systems Engineering.

[7]  Preslav Nakov,et al.  SemEval-2013 Task 2: Sentiment Analysis in Twitter , 2013, *SEMEVAL.

[8]  Rosa M. Carro,et al.  Sentiment analysis in Facebook and its application to e-learning , 2014, Comput. Hum. Behav..

[9]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[10]  Hô Tuòng Vinh,et al.  A Hybrid Approach to Word Segmentation of Vietnamese Texts , 2008, LATA.

[11]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[12]  Vaibhavi N Patodkar,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2016 .

[13]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[14]  Ngo Xuan Bach,et al.  Mining Vietnamese Comparative Sentences for Sentiment Analysis , 2015, 2015 Seventh International Conference on Knowledge and Systems Engineering (KSE).

[15]  Hongfei Lin,et al.  Opinion Mining in e-Learning System , 2007, 2007 IFIP International Conference on Network and Parallel Computing Workshops (NPC 2007).

[16]  Preslav Nakov,et al.  SemEval-2016 Task 4: Sentiment Analysis in Twitter , 2016, *SEMEVAL.