Making useful conflict predictions

One of the major issues in predicting state failure is the relatively rare occurrence of event onset. This class skew problem can cause difficulties in both estimating a model and selecting a decision boundary. Since the publication of King & Zeng’s studies in 2001, scholars have utilized case-control methods to address this issue. This article builds on the landmark research of the Political Instability Task Force comparing the case-control approach to several other methods from the machine learning field and some original to this study. Case-control methods have several practical disadvantages and show no measurable advantages in prediction. The article also introduces cost-sensitive methods for determining a decision boundary. This explication raises questions about the Task Force’s formulation of a decision boundary and suggests methods for making useful predictions for policy. I find that the decision boundary chosen by the PITF implicitly assumes that the cost of intervention is about 7.7% of the cost of non-intervention when state failure will take place. These findings demonstrate that there is still much work to be done in predicting state failure, especially in limiting the number of false positives. More generally, it suggests caution in using accuracy as a measure of success when significant class imbalance exists in the data.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  N. Japkowicz Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[3]  J. Goldstone,et al.  A Global Model for Forecasting Political Instability , 2010 .

[4]  Jens Hainmueller,et al.  Kernel Regularized Least Squares : Moving Beyond Linearity and Additivity Without Sacrificing Interpretability , 2013 .

[5]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[6]  J. Stanley Quasi-Experimentation , 1965, The School Review.

[7]  Susan D. Hyde,et al.  Which Elections Can Be Lost? , 2011, Political Analysis.

[8]  Bruce Bueno de Mesquita,et al.  The War Trap , 1981 .

[9]  Dennis L. Wilson,et al.  Asymptotic Properties of Nearest Neighbor Rules Using Edited Data , 1972, IEEE Trans. Syst. Man Cybern..

[10]  Daniel C. Esty,et al.  State Failure Task Force Report: Phase II Findings , 1999 .

[11]  Bruce Bueno de Mesquita,et al.  A Political Economy of Aid , 2009, International Organization.

[12]  Pedro M. Domingos,et al.  Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier , 1996, ICML.

[13]  Andrew Gelman,et al.  Data Analysis Using Regression and Multilevel/Hierarchical Models , 2006 .

[14]  Michael S. Lewis-Beck,et al.  Election Forecasting for Turbulent Times , 2012, PS: Political Science & Politics.

[15]  Andrew D. Martin,et al.  Competing Approaches to Predicting Supreme Court Decision Making , 2004, Perspectives on Politics.

[16]  Jeannette Jet Lawrence,et al.  Untangling neural nets , 1990 .

[17]  G. King,et al.  Improving Quantitative Studies of International Conflict: A Conjecture , 2000, American Political Science Review.

[18]  Carl E. Klarner,et al.  State-Level Forecasts of the 2012 Federal and Gubernatorial Elections , 2012, PS: Political Science & Politics.

[19]  Philip A. Schrodt Seven Deadly Sins of Contemporary Quantitative Political Analysis ∗ , 2010 .

[20]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[21]  Stanley A. Feder,et al.  FORECASTING FOR POLICY MAKING IN THE POST–COLD WAR PERIOD , 2002 .

[22]  P. Collier,et al.  Greed and Grievance in Civil War , 1999 .

[23]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[24]  Freeman Dyson,et al.  A meeting with Enrico Fermi , 2004, Nature.

[25]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[26]  Gary King,et al.  Improving Forecasts of State Failure , 2001 .

[27]  Virginia Page Fortna,et al.  Pitfalls and Prospects in the Peacekeeping Literature , 2008 .

[28]  Kristin M. Bakke,et al.  The perils of policy by p-value: Predicting civil conflicts , 2010 .

[29]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[30]  Paul Collier,et al.  Development and Conflict , 2004 .

[31]  Henrik Urdal,et al.  Predicting Armed Conflict, 2010–2050 , 2013 .

[32]  L. Keele Semiparametric Regression for the Social Sciences , 2008 .

[33]  Simon Jackman,et al.  Bayesian Analysis for the Social Sciences , 2009 .

[34]  Mehmed Kantardzic,et al.  Learning from Data , 2011 .

[35]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[36]  N. Henstridge Rates of Return on Physical and Human Capital in Africa ' s Manufacturing Sector , 1997 .

[37]  Arcot Sowmya,et al.  Forecasting the onset of genocide and politicide , 2013 .

[38]  Gary King,et al.  Explaining Rare Events in International Relations , 2001, International Organization.

[39]  Edward Leamer Tantalus on the Road to Asymptopia , 2010 .

[40]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[41]  Nicholas Sambanis,et al.  Breaking the Conflict Trap: Civil War and Development Policy , 2003 .

[42]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[43]  Alison Levin-Rector,et al.  Neonatal, postneonatal, childhood, and under-5 mortality for 187 countries, 1970–2010: a systematic analysis of progress towards Millennium Development Goal 4 , 2010, The Lancet.

[44]  Sean P. O'Brien,et al.  Crisis Early Warning and Decision Support: Contemporary Approaches and Thoughts on Future Research , 2010 .

[45]  G. King,et al.  Analyzing Incomplete Political Science Data: An Alternative Algorithm for Multiple Imputation , 2001, American Political Science Review.

[46]  M. Sarkees,et al.  Resort to War: 1816 - 2007 , 2010 .

[47]  Matthew P. Hitt,et al.  Time Series Analysis for the Social Sciences , 2014 .

[48]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[49]  D. Campbell Factors relevant to the validity of experiments in social settings. , 1957, Psychological bulletin.

[50]  Stephen J. Andriole,et al.  Toward the Development of an Integrated Crisis Warning System , 1977 .

[51]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[52]  J. Goldstone,et al.  The State Failure Project: Early Warning Research for US Foreign Policy Planning , 1998 .

[53]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[54]  Foster Provost,et al.  The effect of class distribution on classifier learning: an empirical study , 2001 .

[55]  J. Fearon,et al.  Ethnicity, Insurgency, and Civil War , 2003, American Political Science Review.

[56]  Karl Rihaczek,et al.  1. WHAT IS DATA MINING? , 2019, Data Mining for the Social Sciences.

[57]  Michael D. Ward,et al.  Forecasting is difficult, especially about the future , 2013 .