Identifying Objective and Subjective Words via Topic Modeling

It is observed that distinct words in a given document have either strong or weak ability in delivering facts (i.e., the <italic>objective</italic> sense) or expressing opinions (i.e., the <italic>subjective</italic> sense) depending on the topics they associate with. Motivated by the intuitive assumption that different words have varying degree of <italic>discriminative</italic> power in delivering the objective sense or the subjective sense with respect to their assigned topics, a model named as <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>dentified <inline-formula> <tex-math notation="LaTeX">${o}$ </tex-math></inline-formula>bjective–<inline-formula> <tex-math notation="LaTeX">${s}$ </tex-math></inline-formula>ubjective latent Dirichlet allocation (LDA) (<inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA) is proposed in this paper. In the <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA model, the simple Pólya urn model adopted in traditional topic models is modified by incorporating it with a probabilistic generative process, in which the novel “<italic>Bag-of-Discriminative-Words</italic>” (BoDW) representation for the documents is obtained; each document has two different BoDW representations with regard to objective and subjective senses, respectively, which are employed in the joint objective and subjective classification instead of the traditional Bag-of-Topics representation. The experiments reported on documents and images demonstrate that: 1) the BoDW representation is more predictive than the traditional ones; 2) <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA boosts the performance of topic modeling via the joint discovery of latent topics and the different objective and subjective power hidden in every word; and 3) <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA has lower computational complexity than supervised LDA, especially under an increasing number of topics.

[1]  Geoffrey E. Hinton,et al.  Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[2]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[3]  N. L. Johnson,et al.  Urn models and their application : an approach to modern discrete probability theory , 1978 .

[4]  Fuchun Sun,et al.  Learning Harmonium Models With Infinite Latent Features , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Samuel Kotz,et al.  Urn Models and Their Application: An Approach to Modern Discrete Probability Theory , 1978 .

[6]  Claire Cardie,et al.  Multi-aspect Sentiment Analysis with Topic Models , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[7]  Rongrong Ji,et al.  Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[8]  Peter I. Frazier,et al.  Distance dependent Chinese restaurant processes , 2009, ICML.

[9]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[10]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[11]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[12]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[13]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Ata Kabán,et al.  On an equivalence between PLSI and LDA , 2003, SIGIR.

[15]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[16]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[17]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[18]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[19]  Andrew McCallum,et al.  Group and Topic Discovery from Relations and Their Attributes , 2005, NIPS.

[20]  Jen-Tzung Chien,et al.  Adaptive Bayesian Latent Semantic Analysis , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[22]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[23]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  P. McCullagh,et al.  Generalized Linear Models , 1984 .

[25]  Shuang-Hong Yang,et al.  Bridging the Language Gap: Topic Adaptation for Documents with Different Technicality , 2011, AISTATS.

[26]  Bing Liu,et al.  Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data , 2014, ICML.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29]  Hugo Larochelle,et al.  Topic Modeling of Multimodal Data: An Autoregressive Approach , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Jen-Tzung Chien,et al.  Hierarchical Theme and Topic Modeling , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[31]  Stefan M. Rüger,et al.  Weakly Supervised Joint Sentiment-Topic Detection from Text , 2012, IEEE Transactions on Knowledge and Data Engineering.

[32]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[33]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[34]  Ivan Titov,et al.  Modeling online reviews with multi-grain topic models , 2008, WWW.

[35]  Zhigang Luo,et al.  Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Meng Chang Chen,et al.  Using Incremental PLSI for Threshold-Resilient Online Event Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[37]  Jie Tang,et al.  Can we understand van gogh's mood?: learning to infer affects from images in social networks , 2012, ACM Multimedia.

[38]  Yihong Gong,et al.  Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[39]  Hugo Larochelle,et al.  A Neural Autoregressive Topic Model , 2012, NIPS.

[40]  Constantine Kotropoulos,et al.  Online PLSA: Batch Updating Techniques Including Out-of-Vocabulary Words , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Evangelos E. Milios,et al.  Latent Dirichlet Co-Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[42]  Arjun Mukherjee,et al.  Leveraging Multi-Domain Prior Knowledge in Topic Models , 2013, IJCAI.

[43]  Michael I. Jordan,et al.  Hierarchical Bayesian Models for Applications in Information Retrieval , 2003 .

[44]  Allan Hanbury,et al.  Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[45]  Nizar Bouguila,et al.  Variational Learning for Finite Dirichlet Mixture Models and Applications , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .