Reprint of: Computational approaches for mining user's opinions on the Web 2.0

We carry out an empirical analysis to determine characteristics of social media channels.User generated content is "noisy" and contains mistakes, emoticons, etc.We evaluate text preprocessing algorithms regarding user generated content.Discussion of improvements to opinion mining process. The emerging research area of opinion mining deals with computational methods in order to find, extract and systematically analyze people's opinions, attitudes and emotions towards certain topics. While providing interesting market research information, the user generated content existing on the Web 2.0 presents numerous challenges regarding systematic analysis, the differences and unique characteristics of the various social media channels being one of them. This article reports on the determination of such particularities, and deduces their impact on text preprocessing and opinion mining algorithms. The effectiveness of different algorithms is evaluated in order to determine their applicability to the various social media channels. Our research shows that text preprocessing algorithms are mandatory for mining opinions on the Web 2.0 and that part of these algorithms are sensitive to errors and mistakes contained in the user generated content.

[1]  Lei Zhang,et al.  Sentiment Analysis and Opinion Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[2]  Yoshihiko Gotoh,et al.  Sentence Boundary Detection in Broadcast Speech Transcripts , 2000 .

[3]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[4]  Rohini K. Srihari,et al.  OpinionMiner: a novel machine learning system for web opinion mining and extraction , 2009, KDD.

[5]  Jörg Caumanns,et al.  A fast and simple stemming algorithm for German words , 1999 .

[6]  Lidong Bing,et al.  Normalizing web product attributes and discovering domain ontology with minimal effort , 2011, WSDM '11.

[7]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[8]  Claire Cardie,et al.  Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns , 2005, HLT.

[9]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[10]  Bing Liu,et al.  Mining Comparative Sentences and Relations , 2006, AAAI.

[11]  Bing Liu,et al.  Web Page Cleaning for Web Mining through Feature Weighting , 2003, IJCAI.

[12]  Danushka Bollegala,et al.  Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification , 2011, ACL.

[13]  Tunga Güngör,et al.  Part-of-Speech Tagging , 2005 .

[14]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[15]  Tibor Kiss,et al.  Unsupervised Multilingual Sentence Boundary Detection , 2006, CL.

[16]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[17]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[18]  Iryna Gurevych,et al.  Extracting Opinion Targets in a Single and Cross-Domain Setting with Conditional Random Fields , 2010, EMNLP.

[19]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[20]  Alexandra Balahur,et al.  Sentiment Analysis in Social Media Texts , 2013, WASSA@NAACL-HLT.

[21]  Josef Steinberger,et al.  Creating Sentiment Dictionaries via Triangulation , 2011, Decis. Support Syst..

[22]  Stephan M. Winkler,et al.  On Text Preprocessing for Opinion Mining Outside of Laboratory Environments , 2012, AMT.

[23]  Gilad Mishne,et al.  Predicting Movie Sales from Blogger Sentiment , 2006, AAAI Spring Symposium: Computational Approaches to Analyzing Weblogs.

[24]  Martin Ester,et al.  Opinion digger: an unsupervised opinion miner from unstructured product reviews , 2010, CIKM.

[25]  Edoardo M. Airoldi,et al.  Markov Blankets and Meta-heuristics Search: Sentiment Extraction from Unstructured Texts , 2004, WebKDD.

[26]  Lei Zhang,et al.  Combining lexicon-based and learning-based methods for twitter sentiment analysis , 2011 .

[27]  Tong Zhang,et al.  Fundamental Statistical Techniques , 2010, Handbook of Natural Language Processing.

[28]  Bing Liu,et al.  Mining Opinion Features in Customer Reviews , 2004, AAAI.

[29]  Hao Yu,et al.  Structure-Aware Review Mining and Summarization , 2010, COLING.

[30]  Claire Cardie,et al.  Hierarchical Sequential Learning for Extracting Opinions and Their Attributes , 2010, ACL.

[31]  Alexandra Balahur,et al.  Comparative experiments using supervised learning and machine translation for multilingual sentiment analysis , 2014, Comput. Speech Lang..

[32]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[33]  Fang Wang,et al.  Knowledge Creation in Marketing Based on Data Mining , 2008, 2008 International Conference on Intelligent Computation Technology and Automation (ICICTA).

[34]  Joseph Kaye,et al.  Understanding how bloggers feel: recognizing affect in blog posts , 2006, CHI Extended Abstracts.

[35]  Nicholas Diakopoulos,et al.  Cooooooooooooooollllllllllllll!!!!!!!!!!!!!! Using Word Lengthening to Detect Sentiment in Microblogs , 2011, EMNLP.

[36]  Carolin Kaiser Opinion Mining im Web 2.0 — Konzept und Fallbeispiel , 2014, HMD Praxis der Wirtschaftsinformatik.

[37]  Philip Resnik,et al.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[38]  Rada Mihalcea,et al.  Multilingual Subjectivity: Are More Languages Better? , 2010, COLING.

[39]  Zhong Su,et al.  Domain customization for aspect-oriented opinion analysis with multi-level latent sentiment clues , 2011, CIKM '11.

[40]  Kentaro Inui,et al.  Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[41]  Ari Rappoport,et al.  Enhanced Sentiment Learning Using Twitter Hashtags and Smileys , 2010, COLING.

[42]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[43]  Wei Zhang,et al.  Opinion retrieval from blogs , 2007, CIKM '07.

[44]  Leon Derczynski,et al.  Towards context-aware search and analysis on social media data , 2013, EDBT '13.

[45]  Philip S. Yu,et al.  A holistic lexicon-based approach to opinion mining , 2008, WSDM '08.

[46]  Nicolas Nicolov,et al.  Targeting Sentiment Expressions through Supervised Ranking of Linguistic Configurations , 2009, ICWSM.

[47]  Bing Liu,et al.  Opinion observer: analyzing and comparing opinions on the Web , 2005, WWW '05.

[48]  Josef Steinberger,et al.  Multilingual Entity-Centered Sentiment Analysis Evaluated by Parallel Corpora , 2011, RANLP.

[49]  Janyce Wiebe,et al.  Effects of Adjective Orientation and Gradability on Sentence Subjectivity , 2000, COLING.

[50]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[51]  Michal Karpowicz,et al.  Opinion Mining on the Web 2.0 - Characteristics of User Generated Content and Their Impacts , 2013, CHI-KDD.

[52]  Lina Zhou,et al.  Movie Review Mining: a Comparison between Supervised and Unsupervised Classification Approaches , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[53]  Lipika Dey,et al.  Opinion mining from noisy text data , 2009, International Journal on Document Analysis and Recognition (IJDAR).

[54]  Rada Mihalcea,et al.  Word Sense and Subjectivity , 2006, ACL.

[55]  Bruno Pouliquen,et al.  Opinion Mining on Newspaper Quotations , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[56]  Vibhu O. Mittal,et al.  Comparative Experiments on Sentiment Classification for Online Product Reviews , 2006, AAAI.

[57]  Andrei Mikheev,et al.  Periods, Capitalized Words, etc. , 2002, CL.

[58]  Johanna D. Moore,et al.  Twitter Sentiment Analysis: The Good the Bad and the OMG! , 2011, ICWSM.

[59]  Kalina Bontcheva,et al.  Microblog-genre noise and impact on semantic annotation accuracy , 2013, HT.

[60]  Martin Ester,et al.  On the design of LDA models for aspect-based opinion mining , 2012, CIKM.

[61]  Zhong Su,et al.  Product feature categorization with multilevel latent semantic association , 2009, CIKM.

[62]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[63]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[64]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[65]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[66]  Kenneth Ward Church,et al.  A Spelling Correction Program Based on a Noisy Channel Model , 1990, COLING.

[67]  Theresa Wilson,et al.  Language Identification for Creating Language-Specific Twitter Collections , 2012 .

[68]  Diana Maynard,et al.  Multimodal Sentiment Analysis of Social Media , 2013, SMA@BCS-SGAI.

[69]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[70]  Amy Weinberg,et al.  A distributional and syntactic approach to fine-grained opinion mining , 2011 .