Multinomial Inverse Regression for Text Analysis

Text data, including speeches, stories, and other document forms, are often connected to sentiment variables that are of interest for research in marketing, economics, and elsewhere. It is also very high dimensional and difficult to incorporate into statistical analyses. This article introduces a straightforward framework of sentiment-sufficient dimension reduction for text data. Multinomial inverse regression is introduced as a general tool for simplifying predictor sets that can be represented as draws from a multinomial distribution, and we show that logistic regression of phrase counts onto document annotations can be used to obtain low-dimensional document representations that are rich in sentiment information. To facilitate this modeling, a novel estimation technique is developed for multinomial logistic regression with very high-dimensional response. In particular, independent Laplace priors with unknown variance are assigned to each regression coefficient, and we detail an efficient routine for maximization of the joint posterior over coefficients and their prior scale. This “gamma-lasso” scheme yields stable and effective estimation for general high-dimensional logistic regression, and we argue that it will be superior to current methods in many settings. Guidelines for prior specification are provided, algorithm convergence is detailed, and estimator properties are outlined from the perspective of the literature on nonconcave likelihood penalization. Related work on sentiment analysis from statistics, econometrics, and machine learning is surveyed and connected. Finally, the methods are applied in two detailed examples and we provide out-of-sample prediction studies to illustrate their effectiveness.

[1]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[2]  R. Cook,et al.  Estimating the structural dimension of regressions via parametric inverse regression , 2001 .

[3]  Hoifung Poon,et al.  Unsupervised Semantic Parsing , 2009, EMNLP.

[4]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[5]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[6]  Nicholas G. Polson,et al.  Simulation-based Regularized Logistic Regression , 2010, 1005.3430.

[7]  R. Christensen,et al.  Fisher Lecture: Dimension Reduction in Regression , 2007, 0708.3774.

[8]  Matt Taddy,et al.  On Estimation and Selection for Topic Models , 2011, AISTATS.

[9]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[10]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Paul C. Tetlock Giving Content to Investor Sentiment: The Role of Media in the Stock Market , 2005, The Journal of Finance.

[12]  P. Absil,et al.  Riemannian Geometry of Grassmann Manifolds with a View on Algorithmic Computation , 2004 .

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Stefan Kaufmann,et al.  Classifying Party Affiliation from Political Speech , 2008 .

[15]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[16]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing: Rossi/Bayesian Statistics and Marketing , 2006 .

[17]  Mehran Sahami,et al.  Text Mining: Classification, Clustering, and Applications , 2009 .

[18]  Justin Grimmer,et al.  A Bayesian Hierarchical Topic Model for Political Texts: Measuring Expressed Agendas in Senate Press Releases , 2010, Political Analysis.

[20]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[21]  D. M. Titterington,et al.  Cross-validation in nonparametric estimation of probabilities and probability densities , 1984 .

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[24]  Dragomir R. Radev,et al.  How to Analyze Political Attention with Minimal Assumptions and Costs , 2010 .

[25]  Mee Young Park,et al.  L1‐regularization path algorithm for generalized linear models , 2007 .

[26]  Eric L. Talley,et al.  Room 272 “ The Measure of a MAC : A Machine-Learning Protocol for Analyzing Force Majeure Clauses in M & A Agreements ” , 2011 .

[27]  J. Friedman Fast sparse regression and classification , 2012 .

[28]  Fabio G. Cozman,et al.  Representing and Classifying User Reviews , 2009 .

[29]  Peter E. Rossi,et al.  Bayesian Statistics and Marketing , 2005 .

[30]  T. Hastie,et al.  SparseNet: Coordinate Descent With Nonconvex Penalties , 2011, Journal of the American Statistical Association.

[31]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[32]  M. Schervish Theory of Statistics , 1995 .

[33]  Matt Taddy Design and Analysis of a Text Mining Experiment , 2012 .

[34]  Alessio Sancetta,et al.  Universality of Bayesian Predictions , 2007 .

[35]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[36]  S. Weisberg Dimension Reduction Regression in R , 2002 .

[37]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[38]  S. Frühwirth-Schnatter,et al.  Data Augmentation and MCMC for Binary and Multinomial Logit Models , 2010 .

[39]  Dmitriy Fradkin,et al.  Bayesian Multinomial Logistic Regression for Author Identification , 2005, AIP Conference Proceedings.

[40]  Noah A. Smith,et al.  Movie Reviews and Revenues: An Experiment in Text Regression , 2010, NAACL.

[41]  Matt Thomas,et al.  Get out the vote: Determining support or opposition from Congressional floor-debate transcripts , 2006, EMNLP.

[42]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[43]  M. Laver,et al.  Extracting Policy Positions from Political Texts Using Words as Data , 2003, American Political Science Review.

[44]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[45]  Moshe Koppel,et al.  Good News or Bad News? Let the Market Decide , 2006, Computing Attitude and Affect in Text.

[46]  Matthew West,et al.  Bayesian factor regression models in the''large p , 2003 .

[47]  R. Dennis Cook,et al.  Dimension Reduction in Regressions With Exponential Family Predictors , 2009 .

[48]  Chris Hans Bayesian lasso regression , 2009 .

[49]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[50]  O. Barndorff-Nielsen,et al.  Normal Variance-Mean Mixtures and z Distributions , 1982 .

[51]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[52]  G. Casella,et al.  The Bayesian Lasso , 2008 .

[53]  James G. Scott,et al.  Sparse Bayes estimation in non-Gaussian models via data augmentation , 2011 .

[54]  Scott L. Zeger,et al.  The analysis of binary longitudinal data with time independent covariates , 1985 .

[55]  Sayan Mukherjee,et al.  Supervised Dimension Reduction Using Bayesian Mixture Modeling , 2010, AISTATS.

[56]  M. Gail,et al.  Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates , 1984 .

[57]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[58]  Nello Cristianini,et al.  Detection of Bias in Media Outlets with Statistical Learning Methods , 2009 .

[59]  Jianqing Fan,et al.  Nonconcave penalized likelihood with a diverging number of parameters , 2004, math/0406466.

[60]  W. Massy Principal Components Regression in Exploratory Statistical Research , 1965 .

[61]  R. Cook,et al.  Partial inverse regression , 2007 .

[62]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[63]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[64]  Nicholas G. Polson,et al.  A Monte Carlo Approach to Nonnormal and Nonlinear State-Space Modeling , 1992 .

[65]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[66]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[67]  Keith T. Poole,et al.  Spatial Models of Parliamentary Voting , 2005 .

[68]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[69]  James G. Scott,et al.  Data augmentation for non-Gaussian regression models using variance-mean mixtures , 2011, 1103.5407.