Big Data, Selection Bias, and the Statistical Patterns of Mortality in Conflict

This paper explores how information is generated about killings in conflict, and how the process of generation shapes the statistical patterns in the observed data. The difference between the observed patterns and the true patterns is called bias, two examples of which will be examined. First, we compare multiple individual sources reporting identifiable killings in Syria, highlighting variations in the likely probabilities of reporting for events of different sizes. Second, we conduct a similar analysis examining the number of sources reporting events of varying sizes in the Iraq Body Count public dataset. In both cases we explore how depending on the observed data without accounting for bias caused by missing data could mislead policy. The paper closes with recommendations about the use of data and analysis in the development of policy.