We present a pseudo-observed variable based regularization technique for latent variable mixed-membership models that provides a mechanism to impose preferences on the characteristics of aggregate functions of latent and observed variables. The regularization framework is used to regularize topic models, which are latent variable mixed membership models for language modeling. In many domains, documents and words often exhibit only a slight degree of mixed-membership behavior that is inadequately modeled by topic models which are overly liberal in permitting mixed-membership behavior. The regularization introduced in the paper is used to control the degree of polysemy of words permitted by topic models and to prefer sparsity in topic distributions of documents in a manner that is much more flexible than permitted by modification of priors. The utility of the regularization in exploiting sentiment-indicative features is evaluated internally using document perplexity and externally by using the models to predict star counts in movie and product reviews based on the content of the reviews. Results of our experiments show that using the regularization to finely control the behavior of topic models leads to better perplexity and lower mean squared error rates in the star-prediction task.
[1]
John Blitzer,et al.
Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification
,
2007,
ACL.
[2]
Ivan Titov,et al.
A Joint Model of Text and Aspect Ratings for Sentiment Summarization
,
2008,
ACL.
[3]
Ben Taskar,et al.
Posterior Regularization for Structured Latent Variable Models
,
2010,
J. Mach. Learn. Res..
[4]
Gideon S. Mann,et al.
Generalized Expectation Criteria for Semi-Supervised Learning with Weakly Labeled Data
,
2010,
J. Mach. Learn. Res..
[5]
Bo Pang,et al.
Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales
,
2005,
ACL.
[6]
Alice H. Oh,et al.
Aspect and sentiment unification model for online review analysis
,
2011,
WSDM '11.
[7]
Michael I. Jordan,et al.
Latent Dirichlet Allocation
,
2001,
J. Mach. Learn. Res..
[8]
Edoardo M. Airoldi,et al.
Mixed Membership Stochastic Blockmodels
,
2007,
NIPS.
[9]
David M. Blei,et al.
Supervised Topic Models
,
2007,
NIPS.
[10]
David M. Blei,et al.
Hierarchical relational models for document networks
,
2009,
0909.4331.