Adaptive Ensembling: Unsupervised Domain Adaptation for Political Document Analysis

Insightful findings in political science often require researchers to analyze documents of a certain subject or type, yet these documents are usually contained in large corpora that do not distinguish between pertinent and non-pertinent documents. In contrast, we can find corpora that label relevant documents but have limitations (e.g., from a single source or era), preventing their use for political science research. To bridge this gap, we present \textit{adaptive ensembling}, an unsupervised domain adaptation framework, equipped with a novel text classification model and time-aware training to ensure our methods work well with diachronic corpora. Experiments on an expert-annotated dataset show that our framework outperforms strong benchmarks. Further analysis indicates that our methods are more stable, learn better representations, and extract cleaner corpora for fine-grained analysis.

[1]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[2]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[3]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[4]  Yoshua Bengio,et al.  Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach , 2011, ICML.

[5]  Regina Barzilay,et al.  Aspect-augmented Adversarial Networks for Domain Adaptation , 2017, TACL.

[6]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[7]  Ralph Grishman,et al.  Domain Adaptation for Relation Extraction with Domain Adversarial Neural Network , 2017, IJCNLP.

[8]  Michael J. Paul,et al.  Examining Temporality in Document Classification , 2018, ACL.

[9]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[10]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[11]  Vladlen Koltun,et al.  Trellis Networks for Sequence Modeling , 2018, ICLR.

[12]  Geoffrey French,et al.  Self-ensembling for visual domain adaptation , 2017, ICLR.

[13]  Noah A. Smith,et al.  Etch-a-Sketching: Evaluating the Post-Primary Rhetorical Moderation Hypothesis , 2018, American Politics Research.

[14]  Junyi Jessy Li,et al.  Domain Agnostic Real-Valued Specificity Prediction , 2018, AAAI.

[15]  Kilian Q. Weinberger,et al.  Marginalized Denoising Autoencoders for Domain Adaptation , 2012, ICML.

[16]  D. Rucinski The Nature and Origins of Mass Opinion. , 1994 .

[17]  Alex Graves,et al.  Neural Machine Translation in Linear Time , 2016, ArXiv.

[18]  Yonatan Belinkov,et al.  Identifying and Controlling Important Neurons in Neural Machine Translation , 2018, ICLR.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  M. Baum,et al.  The Relationships Between Mass Media, Public Opinion, and Foreign Policy: Toward a Theoretical Synthesis , 2008 .

[22]  Adam J. Berinsky,et al.  The Illusion of Public Opinion: Fact and Artifact in American Public Opinion Polls , 2005, Perspectives on Politics.

[23]  P. Schmidt,et al.  Measurement Equivalence in Cross-National Research , 2014 .

[24]  Michael J. Ensley,et al.  Policy and the structure of roll call voting in the US house , 2020, Journal of Public Policy.

[25]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[27]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[28]  A. Gelman,et al.  Partisans without Constraint: Political Polarization and Trends in American Public Opinion. , 2008, AJS; American journal of sociology.

[29]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[30]  Joe Bob Hester Setting the Agenda: The Mass Media and Public Opinion , 2005 .

[31]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[32]  Zhiting Hu,et al.  Improved Variational Autoencoders for Text Modeling using Dilated Convolutions , 2017, ICML.

[33]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[34]  Robert E. Goodin,et al.  The Oxford Handbook of Political Science , 2011 .

[35]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[38]  Vladlen Koltun,et al.  An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling , 2018, ArXiv.

[39]  Andrew Gordon Wilson,et al.  There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average , 2018, ICLR.

[40]  P. Converse,et al.  The American voter , 1960 .

[41]  Claire Cardie,et al.  Adversarial Deep Averaging Networks for Cross-Lingual Sentiment Classification , 2016, TACL.

[42]  Angus Campbell,et al.  The American voter , 1960 .