Treating Words as Data with Error: Uncertainty in Text Statements of Policy Positions

Political text offers extraordinary potential as a source of information about the policy positions of political actors. Despite recent advances in computational text analysis, human interpretative coding of text remains an important source of text-based data, ultimately required to validate more automatic techniques. The profession’s main source of cross-national, time-seriesdataonpartypolicypositionscomesfromthehumaninterpretativecodingofpartymanifestosbytheComparative Manifesto Project (CMP). Despite widespread use of these data, the uncertainty associated with each point estimate has neverbeenavailable,underminingthevalueofthedatasetasascientificresource.Weproposearemedy.First,wecharacterize processes by which CMP data are generated. These include inherently stochastic processes of text authorship, as well as of the parsing and coding of observed text by humans. Second, we simulate these error-generating processes by bootstrapping analysesofcodedquasi-sentences.Thisallowsustoestimatepreciselevelsofnonsystematicerrorforeverycategoryandscale reported by the CMP for its entire set of 3,000-plus manifestos. Using our estimates of these errors, we show how to correct biased inferences, in recent prominently published work, derived from statistical analyses of error-contaminated CMP data.

[1]  Tom Louwerse Mapping Policy Preferences II: Estimates for Parties, Electors, and Governments in Eastern Europe, European Union and OECD 1990–2003 , 2009 .

[2]  L. Ezrow,et al.  Are Niche Parties Fundamentally Different from Mainstream Parties? The Causes and the Electoral Consequences of Western European Parties' Policy Shifts, 1976-1998 , 2006 .

[3]  Gérard Roland,et al.  Dimensions of politics in the European Parliament , 2006 .

[4]  Voter ideology in Western Democracies, 1946– 1989 , 1998 .

[5]  John D. Huber,et al.  Putting Parties in Their Place: Inferring Party Left-Right Ideological Positions from Party Manifestos Data , 2000 .

[6]  J. Hausman Mismeasured Variables in Econometric Analysis: Problems from the Right and Problems from the Left , 2001 .

[7]  Kathleen R. McKeown,et al.  Text generation , 1985 .

[8]  Gary King,et al.  Extracting Systematic Social Science Meaning from Text 1 , 2007 .

[9]  Kenneth A. Bollen,et al.  DEMOCRACY, STABILITY, AND DICHOTOMIES* , 1989 .

[10]  I. Budge,et al.  Ideology, strategy and party change : spatial analyses of post-war election programmes in 19 democracies , 1987 .

[11]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[12]  I. Indridason,et al.  Multiparty Democracy: Elections and Legislative Politics , 2008, Perspectives on Politics.

[13]  M. Laver,et al.  Extracting Policy Positions from Political Texts Using Words as Data , 2003, American Political Science Review.

[14]  James W. Hardin,et al.  The Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error , 2003 .

[15]  Raymond J. Carroll,et al.  Measurement error in nonlinear models: a modern perspective , 2006 .

[16]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[17]  Kenneth Benoit,et al.  Party Policy in Modern Democracies , 2006 .

[18]  Stan Hurn Panel Data Econometrics , 2010 .

[19]  David R. Mayhew Divided We Govern , 1991 .

[20]  J. R. Cook,et al.  Simulation-Extrapolation: The Measurement Error Jackknife , 1995 .

[21]  I. Budge,et al.  Mapping Policy Preferences: Estimates for Parties, Electors, and Governments 1945-1998 , 2001 .

[22]  M. Laver,et al.  Benchmarks for text analysis: A response to Budge and Pennings , 2007 .

[23]  Sven-Oliver Proksch,et al.  A Scaling Model for Estimating Time-Series Party Positions from Texts , 2007 .

[24]  Slava Mikhaylov Trinity,et al.  Coder Reliability and Misclassification in Comparative Manifesto Project Codings∗ , 2008 .

[25]  D. Hall Measurement Error in Nonlinear Models: A Modern Perspective , 2008 .

[26]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[27]  Michael D. McDonald,et al.  Checking the party policy estimates : convergent validity , 2001 .

[28]  M. Laver,et al.  Estimating policy positions from political texts , 2000 .

[29]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[30]  J. Hausman,et al.  Response Error in a Transformation Model with an Application to Earnings-Equation Estimation , 2004 .

[31]  Susan A. Murphy,et al.  Monographs on statistics and applied probability , 1990 .

[32]  S. Mendes,et al.  The policy space of party manifestos , 2001 .

[33]  David R. Mayhew Divided we govern : party control, lawmaking, and investigations, 1946-1990 , 1992 .

[34]  Joshua D. Clinton,et al.  The Statistical Analysis of Roll Call Data , 2004, American Political Science Review.