论文信息 - Identifying Objective and Subjective Words via Topic Modeling

Identifying Objective and Subjective Words via Topic Modeling

It is observed that distinct words in a given document have either strong or weak ability in delivering facts (i.e., the <italic>objective</italic> sense) or expressing opinions (i.e., the <italic>subjective</italic> sense) depending on the topics they associate with. Motivated by the intuitive assumption that different words have varying degree of <italic>discriminative</italic> power in delivering the objective sense or the subjective sense with respect to their assigned topics, a model named as <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>dentified <inline-formula> <tex-math notation="LaTeX">${o}$ </tex-math></inline-formula>bjective–<inline-formula> <tex-math notation="LaTeX">${s}$ </tex-math></inline-formula>ubjective latent Dirichlet allocation (LDA) (<inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA) is proposed in this paper. In the <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA model, the simple Pólya urn model adopted in traditional topic models is modified by incorporating it with a probabilistic generative process, in which the novel “<italic>Bag-of-Discriminative-Words</italic>” (BoDW) representation for the documents is obtained; each document has two different BoDW representations with regard to objective and subjective senses, respectively, which are employed in the joint objective and subjective classification instead of the traditional Bag-of-Topics representation. The experiments reported on documents and images demonstrate that: 1) the BoDW representation is more predictive than the traditional ones; 2) <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA boosts the performance of topic modeling via the joint discovery of latent topics and the different objective and subjective power hidden in every word; and 3) <inline-formula> <tex-math notation="LaTeX">${i}$ </tex-math></inline-formula>osLDA has lower computational complexity than supervised LDA, especially under an increasing number of topics.

[1] Geoffrey E. Hinton,et al. Replicated Softmax: an Undirected Topic Model , 2009, NIPS.

[2] Chong Wang,et al. Continuous Time Dynamic Topic Models , 2008, UAI.

[3] N. L. Johnson,et al. Urn models and their application : an approach to modern discrete probability theory , 1978 .

[4] Fuchun Sun,et al. Learning Harmonium Models With Infinite Latent Features , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[5] Samuel Kotz,et al. Urn Models and Their Application: An Approach to Modern Discrete Probability Theory , 1978 .

[6] Claire Cardie,et al. Multi-aspect Sentiment Analysis with Topic Models , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[7] Rongrong Ji,et al. Large-scale visual sentiment ontology and detectors using adjective noun pairs , 2013, ACM Multimedia.

[8] Peter I. Frazier,et al. Distance dependent Chinese restaurant processes , 2009, ICML.

[9] Andrew McCallum,et al. Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[10] Rong Yan,et al. Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[11] Thomas L. Griffiths,et al. Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[12] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[13] Mark Steyvers,et al. Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[14] Ata Kabán,et al. On an equivalence between PLSI and LDA , 2003, SIGIR.

[15] David M. Blei,et al. Supervised Topic Models , 2007, NIPS.

[16] Thomas Hofmann,et al. Probabilistic latent semantic indexing , 1999, SIGIR '99.

[17] Tom Minka,et al. Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[18] Jaegul Choo,et al. UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[19] Andrew McCallum,et al. Group and Topic Discovery from Relations and Their Attributes , 2005, NIPS.

[20] Jen-Tzung Chien,et al. Adaptive Bayesian Latent Semantic Analysis , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[21] Susan T. Dumais,et al. Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[22] David B. Dunson,et al. Probabilistic topic models , 2011, KDD '11 Tutorials.

[23] Chong Wang,et al. Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[24] P. McCullagh,et al. Generalized Linear Models , 1984 .

[25] Shuang-Hong Yang,et al. Bridging the Language Gap: Topic Adaptation for Documents with Different Technicality , 2011, AISTATS.

[26] Bing Liu,et al. Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data , 2014, ICML.

[27] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28] Tom Fawcett,et al. An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[29] Hugo Larochelle,et al. Topic Modeling of Multimodal Data: An Autoregressive Approach , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Jen-Tzung Chien,et al. Hierarchical Theme and Topic Modeling , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[31] Stefan M. Rüger,et al. Weakly Supervised Joint Sentiment-Topic Detection from Text , 2012, IEEE Transactions on Knowledge and Data Engineering.

[32] Michael I. Jordan,et al. Modeling annotated data , 2003, SIGIR.

[33] Xu Ling,et al. Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[34] Ivan Titov,et al. Modeling online reviews with multi-grain topic models , 2008, WWW.

[35] Zhigang Luo,et al. Online Nonnegative Matrix Factorization With Robust Stochastic Approximation , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[36] Meng Chang Chen,et al. Using Incremental PLSI for Threshold-Resilient Online Event Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[37] Jie Tang,et al. Can we understand van gogh's mood?: learning to infer affects from images in social networks , 2012, ACM Multimedia.

[38] Yihong Gong,et al. Multi-Document Summarization using Sentence-based Topic Models , 2009, ACL.

[39] Hugo Larochelle,et al. A Neural Autoregressive Topic Model , 2012, NIPS.

[40] Constantine Kotropoulos,et al. Online PLSA: Batch Updating Techniques Including Out-of-Vocabulary Words , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[41] Evangelos E. Milios,et al. Latent Dirichlet Co-Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[42] Arjun Mukherjee,et al. Leveraging Multi-Domain Prior Knowledge in Topic Models , 2013, IJCAI.

[43] Michael I. Jordan,et al. Hierarchical Bayesian Models for Applications in Information Retrieval , 2003 .

[44] Allan Hanbury,et al. Affective image classification using features inspired by psychology and art theory , 2010, ACM Multimedia.

[45] Nizar Bouguila,et al. Variational Learning for Finite Dirichlet Mixture Models and Applications , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[46] Michael I. Jordan,et al. Hierarchical Dirichlet Processes , 2006 .