Design Hybrid Models for Opinion Mining on Vietnamese Social Media Text Data

The rapid development of information communications technology, especially Internet and smartphones, helps customers be more flexible and easier to access social networking sites and use them as effective communication tools. A huge number of informal messages are posted every day in social networking sites including comments, opinions and feedbacks about products, services or companies. These text data are not only in English but also in several other languages as the social networking sites develop across countries. It has become difficult and time consuming for individuals or organizations to effectively process the information underlined in these text data. Thanks to the development of opinion mining techniques, social media text data can be mined to explore customer opinions about products, services as well as information about competitors. This paper proposed models for opinion mining on Vietnamese social media text data. We collected social media text data from Facebook in Vietnam and designed a non-standard Vietnamese words dictionary to process informal Vietnamese text messages. We compared predictive performance of several opinion mining models in lexicon-based and machine learning approach and then proposed a hybrid model that combines the two approaches. The results show that using non-standard Vietnamese words dictionary improves predictive performance of opinion mining models, and hybrid models of lexicon-based and machine learning approach have better performance than single models. Based on this research outcomes, we provided recommendations in designing opinion mining models on non-English social media text data.