Measuring Prejudice and Ethnic Tensions in User-Generated Content

With the spread of social media, ethnic prejudice is becoming publicly available to widening audiences and may have serious offline consequences. This creates demand to detect prejudice and other signs of ethnic tension in usergenerated texts, and this task is absolutely different from measuring prejudice with surveys – an approach traditionally developed in psychology. In this work we use a hand coding instrument based on psychological definitions of prejudice and sociological methods of questionnaire construction. Compared to our previous research, we double our hand-coded collection that reaches 14,998 unique user texts retrieved from the Russian language social media. We then train computer classification algorithms to “guess” prejudice as detected by human coders and show significant improvement in quality compared to our earlier results. Still, as not all aspects of prejudice get detected sufficiently well, we analyze potential causes of low quality and outline directions for further improvement.