Visual Analysis of Conflicting Opinions

Understanding the nature and dynamics of conflicting opinions is a profound and challenging issue. In this paper we address several aspects of the issue through a study of more than 3,000 Amazon customer reviews of the controversial bestseller The Da Vinci Code, including 1,738 positive and 918 negative reviews. The study is motivated by critical questions such as: what are the differences between positive and negative reviews? What is the origin of a particular opinion? How do these opinions change over time? To what extent can differentiating features be identified from unstructured text? How accurately can these features predict the category of a review? We first analyze terminology variations in these reviews in terms of syntactic, semantic, and statistic associations identified by TermWatch and use term variation patterns to depict underlying topics. We then select the most predictive terms based on log likelihood tests and demonstrate that this small set of terms classifies over 70% of the conflicting reviews correctly. This feature selection process reduces the dimensionality of the feature space from more than 20,000 dimensions to a couple of hundreds. We utilize automatically generated decision trees to facilitate the understanding of conflicting opinions in terms of these highly predictive terms. This study also uses a number of visualization and modeling tools to identify not only what positive and negative reviews have in common, but also they differ and evolve over time

[1]  Darrell Laham,et al.  From paragraph to graph: Latent semantic analysis for information visualization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Catherine Plaisant,et al.  Exploring erotics in Emily Dickinson's correspondence with text mining and visual interfaces , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[3]  Martin Wattenberg,et al.  Arc diagrams: visualizing structure in strings , 2002, IEEE Symposium on Information Visualization, 2002. INFOVIS 2002..

[4]  Béatrice Daille,et al.  Conceptual Structuring through Term Variations , 2003, ACL 2003.

[5]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[6]  Geoffrey Sampson,et al.  The Oxford Handbook of Computational Linguistics , 2003, Lit. Linguistic Comput..

[7]  Ibekwe-Sanjuan Fidelia,et al.  Mining textual data through term variant clustering: the TermWatch system , 2004 .

[8]  Lucy T. Nowell,et al.  ThemeRiver: Visualizing Thematic Changes in Large Document Collections , 2002, IEEE Trans. Vis. Comput. Graph..

[9]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[10]  Eric SanJuan,et al.  Mining Textual Data through Term Variant Clustering : the TermWatch system , 2004, RIAO.

[11]  Terry Winograd,et al.  SenseMaker: an information-exploration interface supporting the contextual evolution of a user's interests , 1997, CHI.

[12]  Thorsten Joachims,et al.  Identifying Temporal Patterns and Key Players in Document Collections , 1995 .

[13]  Chris Weaver Building Highly-Coordinated Visualizations in Improvise , 2004, IEEE Symposium on Information Visualization.

[14]  Christian Jacquemin,et al.  Term Extraction and Automatic Indexing , 2005 .

[15]  Mohamed Nadif,et al.  Classification et désarticulation de graphes de termes , 2004 .

[16]  Matthew Hurst,et al.  BlogPulse: Automated Trend Discovery for Weblogs , 2003 .

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[19]  Fidelia Ibekwe-Sanjuan A Linguistic and Mathematical Method for Mapping Thematic Trends from Texts , 1998, ECAI.

[20]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[21]  Ian Witten,et al.  Data Mining , 2000 .

[22]  Ludovic Lebart,et al.  Exploring Textual Data , 1997 .

[23]  Eric SanJuan,et al.  Text mining without document context , 2006, Inf. Process. Manag..

[24]  Chaomei Chen,et al.  CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature , 2006, J. Assoc. Inf. Sci. Technol..

[25]  K Weaver Magnesium and Migraine , 1990, Headache.

[26]  Chaomei Chen,et al.  Web site design with the patron in mind: A step-by-step guide for libraries , 2006 .

[27]  Ata Kabán,et al.  A Dynamic Probabilistic Model to Visualise Topic Evolution in Text Streams , 2002, Journal of Intelligent Information Systems.

[28]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .