Measuring Political Sentiment on Twitter: Factor Optimal Design for Multinomial Inverse Regression

This article presents a short case study in text analysis: the scoring of Twitter posts for positive, negative, or neutral sentiment directed toward particular U.S. politicians. The study requires selection of a subsample of representative posts for sentiment scoring, a common and costly aspect of sentiment mining. As a general contribution, our application is preceded by a proposed algorithm for maximizing sampling efficiency. In particular, we outline and illustrate greedy selection of documents to build designs that are D-optimal in a topic-factor decomposition of the original text. The strategy is applied to our motivating dataset of political posts, and we outline a new technique for predicting both generic and subject-specific document sentiment through the use of variable interactions in multinomial inverse regression. Results are presented for analysis of 2.1 million Twitter posts collected around February 2012. Computer codes and data are provided as supplementary material online.

[1]  Prasad Tadepalli,et al.  Active Learning with Committees for Text Categorization , 1997, AAAI/IAAI.

[2]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[3]  Peter Müller,et al.  Optimal Design via Curve Fitting of Monte Carlo Experiments , 1995 .

[4]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[5]  Thomas F. Edgar,et al.  PCA Combined Model-Based Design of Experiments (DOE) Criteria for Differential and Algebraic System Parameter Estimation , 2008 .

[6]  David A. Cohn,et al.  Neural Network Exploration Using Optimal Experiment Design , 1993, NIPS.

[7]  R. C. St. John,et al.  D-Optimality for Regression Designs: A Review , 1975 .

[8]  Rong Hu,et al.  Active Learning for Text Classification , 2011 .

[9]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[10]  Sandro Macchietto,et al.  Model-Based Design of Parallel Experiments , 2007 .

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Herbert K. H. Lee,et al.  Bayesian Guided Pattern Search for Robust Local Optimization , 2009, Technometrics.

[13]  Rong Jin,et al.  Large-scale text categorization by batch mode active learning , 2006, WWW '06.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Grégory Mermoud Model-Based Design , 2014 .

[16]  Matt Taddy,et al.  Multinomial Inverse Regression for Text Analysis , 2010, 1012.2098.

[17]  Saturnino Luz,et al.  Dimensionality reduction for active learning with nearest neighbour classifier in text categorisation problems , 2007, Sixth International Conference on Machine Learning and Applications (ICMLA 2007).

[18]  Brian Mac Namee,et al.  Exploring the Frontier of Uncertainty Space , 2010 .

[19]  Matt Taddy,et al.  On Estimation and Selection for Topic Models , 2011, AISTATS.

[20]  L. Tierney,et al.  Accurate Approximations for Posterior Moments and Marginal Densities , 1986 .

[21]  Robert B. Gramacy,et al.  Adaptive Design and Analysis of Supercomputer Experiments , 2008, Technometrics.

[22]  Pietro Perona,et al.  Entropy-based active learning for object recognition , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[23]  Robert B. Gramacy,et al.  Dynamic Trees for Learning and Design , 2009, 0912.1586.

[24]  N. N. Chan A-Optimality for Regression Designs. , 1982 .

[25]  A. Wald On the Efficient Design of Statistical Investigations , 1943 .

[26]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[27]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[28]  Zheng Chen,et al.  Effective multi-label active learning for text classification , 2009, KDD.

[29]  Robert B. Gramacy,et al.  Particle Learning of Gaussian Process Models for Sequential Design and Optimization , 2009, 0909.5262.

[30]  G. Andrew,et al.  arm: Data Analysis Using Regression and Multilevel/Hierarchical Models , 2014 .

[31]  Prasad Tadepalli,et al.  Active Learning with Committees , 1997, AAAI/IAAI.