Lexical and Hierarchical Topic Regression

Inspired by a two-level theory from political science that unifies agenda setting and ideological framing, we propose supervised hierarchical latent Dirichlet allocation (SHLDA), which jointly captures documents' multi-level topic structure and their polar response variables. Our model extends the nested Chinese restaurant processes to discover tree-structured topic hierarchies and uses both per-topic hierarchical and per-word lexical regression parameters to model response variables. SHLDA improves prediction on political affiliation and sentiment tasks in addition to providing insight into how topics under discussion are framed.

[1]  Thomas L. Griffiths,et al.  The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies , 2007, JACM.

[2]  Alice H. Oh,et al.  Aspect and sentiment unification model for online review analysis , 2011, WSDM '11.

[3]  J. Druckman The Decline of the Death Penalty and the Discovery of Innocence , 2008 .

[4]  Alexander J. Smola,et al.  The Nested Chinese Restaurant Franchise Process: User Tracking and Document Modeling , 2013 .

[5]  Salma I. Ghanem,et al.  The Convergence of Agenda Setting and Framing , 2001 .

[6]  Noah A. Smith,et al.  Shedding (a Thousand Points of) Light on Biased Language , 2010, Mturk@HLT-NAACL.

[7]  Philip Resnik,et al.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Andrew McCallum,et al.  Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression , 2008, UAI.

[10]  Jianhua Zhang,et al.  Explore Objects and Categories in Unexplored Environments Based on Multimodal Data , 2012 .

[11]  Kathleen McKeown,et al.  A Hierarchical Model of Web Summaries , 2011, ACL.

[12]  Eric P. Xing,et al.  Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective , 2010, EMNLP.

[13]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[14]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[15]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[16]  Philip Resnik,et al.  More than Words: Syntactic Packaging and Implicit Sentiment , 2009, NAACL.

[17]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[18]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[19]  Stefan Kaufmann,et al.  Classifying Party Affiliation from Political Speech , 2008 .

[20]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[21]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.

[22]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[23]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[24]  Alexander J. Smola,et al.  Nested Chinese Restaurant Franchise Process: Applications to User Tracking and Document Modeling , 2013, ICML.

[25]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[26]  Frank D. Wood,et al.  Hierarchically Supervised Latent Dirichlet Allocation , 2011, NIPS.

[27]  Hongfei Yan,et al.  SSHLDA: A Semi-Supervised Hierarchical Topic Model , 2012, EMNLP.

[28]  Chong Wang,et al.  Nested Hierarchical Dirichlet Processes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Daniel Jurafsky,et al.  Linguistic Models for Analyzing and Detecting Biased Language , 2013, ACL.

[30]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[31]  Keith T. Poole,et al.  Measuring Bias and Uncertainty in Ideal Point Estimates via the Parametric Bootstrap , 2004, Political Analysis.

[32]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[33]  Dongwoo Kim,et al.  Modeling topic hierarchies with the recursive chinese restaurant process , 2012, CIKM.

[34]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.