Generic high-throughput methods for multilingual sentiment detection

Digital ecosystems typically involve a large number of participants from different sectors who generate rapidly growing archives of unstructured text. Measuring the frequency of certain terms to determine the popularity of a topic is comparably straightforward. Detecting sentiment expressed in user-generated electronic content is more challenging, especially in the case of digital ecosystems comprising heterogeneous sets of multilingual documents. This paper describes the use of language-specific grammar patterns and multilingual tagged dictionaries to detect sentiment in German and English document repositories. Digital ecosystems may contain millions of frequently updated documents, requiring sentiment detection methods that maximize throughput. The ideal combination of high-throughput techniques and more accurate (but slower) approaches depends on the specific requirements of an application. To accommodate a wide range of possible applications, this paper presents (i) an adaptive method, balancing accuracy and scalability of multilingual textual sources, (ii) a generic approach for generating language- specific grammar patterns and multilingual tagged dictionaries, and (iii) an extensive evaluation verifying the method's performance based on Amazon product reviews and user evaluations from Sentiment Quiz, a “game with a purpose” that invites users of the Facebook social networking platform to assess the sentiment of individual sentences.

[1]  Arno Scharl,et al.  An Automated Approach to Investigating the Online Media Coverage of U.S. Presidential Elections , 2008 .

[2]  Marie-Francine Moens,et al.  A machine learning approach to sentiment analysis in multilingual Web texts , 2009, Information Retrieval.

[3]  Janyce Wiebe,et al.  Tracking Point of View in Narrative , 1994, Comput. Linguistics.

[4]  Andrés Montoyo,et al.  Multilingual Feature-Driven Opinion Extraction and Summarization from Customer Reviews , 2008, NLDB.

[5]  Arno Scharl,et al.  Multiple coordinated views for searching and navigating Web content repositories , 2009, Inf. Sci..

[6]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[7]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[8]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[9]  Watanabe Hideo,et al.  Deeper Sentiment Analysis Using Machine Translation Technology , 2004, COLING.

[10]  Marshall S. Smith,et al.  The general inquirer: A computer approach to content analysis. , 1967 .

[11]  Fei Song,et al.  Improving sentiment analysis with Part-of-Speech weighting , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[12]  Ioannis Pitas,et al.  Language identification in web documents using discrete HMMs , 2004, Pattern Recognit..

[13]  Arno Scharl,et al.  Determining the Semantic Orientation of Web-Based Corpora , 2003, IDEAL.

[14]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[15]  Janyce Wiebe,et al.  Articles: Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis , 2009, CL.

[16]  Hsin-Hsi Chen,et al.  Opinion Analysis Across Languages: An Overview of and Observations from the NTCIR6 Opinion Analysis Pilot Task , 2007, WILF.

[17]  Oren Etzioni,et al.  Extracting Product Features and Opinions from Reviews , 2005, HLT.

[18]  Gregory Grefenstette,et al.  Mining Multilingual Opinions through Classification and Translation , 2004 .

[19]  Alistair Kennedy,et al.  SENTIMENT CLASSIFICATION of MOVIE REVIEWS USING CONTEXTUAL VALENCE SHIFTERS , 2006, Comput. Intell..

[20]  Michael L. Littman,et al.  Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus , 2002, ArXiv.

[21]  Soo-Min Kim,et al.  Identifying and Analyzing Judgment Opinions , 2006, NAACL.

[22]  Philip J. Stone,et al.  Extracting Information. (Book Reviews: The General Inquirer. A Computer Approach to Content Analysis) , 1967 .

[23]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[24]  Rachel Giora,et al.  Discourse coherence and theory of relevance: Stumbling blocks in search of a unified theory , 1997 .