Finding Sentiment Dimension in Vector Space of Movie Reviews: An Unsupervised Approach

This study suggests an unsupervised method to find sentiment orienations of the words in Korean movie reviews. The orientations are represented as real values on a sentiment domain, which is derived from high-dimensional vector space for the movie reviews. To search for the dimension, the Pointwise Mutual Information is first used to select a set of words that are close to common modifiers; The phrases comprised of these words often form good/ bad associations (e.g., “good acting”, “terrible acting”). A neural language model (Word2Vec) is then used to calculate the point-wise similarity distances between the chosen words and, dimensionality reduction algorithms (e.g., PCA, MDS) are employed to find the axis of the sentiment orientations. Finally, the performance of our method is measured by unsupervised classification of the two movie reviews based on the orientation values. According to the results, the best accuracy achieves 66% and 76% for the two datasets.

[1]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[2]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[3]  David Abend,et al.  Semantic Relations And The Lexicon Antonymy Synonymy And Other Paradigms , 2016 .

[4]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[5]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[6]  Vasileios Hatzivassiloglou,et al.  Predicting the Semantic Orientation of Adjectives , 1997, ACL.

[7]  Kathleen R. McKeown,et al.  Predicting the semantic orientation of adjectives , 1997 .

[8]  Mirella Lapata,et al.  Constructing Semantic Space Models from Parsed Corpora , 2003, ACL.

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[11]  Peter Gärdenfors,et al.  Conceptual spaces - the geometry of thought , 2000 .

[12]  Hinrich Schütze,et al.  Information retrieval based on word senses , 1995 .

[13]  Magnus Sahlgren,et al.  The Distributional Hypothesis , 2008 .

[14]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  John Carroll,et al.  Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Text , 2008, COLING.

[17]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[18]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[19]  Justin Picard,et al.  Finding content-bearing terms using term similarities , 1999, EACL.

[20]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[21]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.