Kashmir: A Computational Analysis of the Voice of Peace

The recent Pulwama terror attack (February 14, 2019, Pulwama, Kashmir) triggered a chain of escalating events between India and Pakistan adding another episode to their 70-year-old dispute over Kashmir. The present era of ubiquitious social media has never seen nuclear powers closer to war. In this paper, we analyze this evolving international crisis via a substantial corpus constructed using comments on YouTube videos (921,235 English comments posted by 392,460 users out of 2.04 million overall comments by 791,289 users on 2,890 videos). Our main contributions in the paper are three-fold. First, we present an observation that polyglot word-embeddings reveal precise and accurate language clusters, and subsequently construct a document language-identification technique with negligible annotation requirements. We demonstrate the viability and utility across a variety of data sets involving several low-resource languages. Second, we present an extensive analysis on temporal trends of pro-peace and pro-war intent through a manually constructed polarity phrase lexicon. We observe that when tensions between the two nations were at their peak, pro-peace intent in the corpus was at its highest point. Finally, in the context of heated discussions in a politically tense situation where two nations are at the brink of a full-fledged war, we argue the importance of automatic identification of user-generated web content that can diffuse hostility and address this prediction task, dubbed \emph{hope-speech detection}.

[1]  Jungo Kasai,et al.  Polyglot Contextual Representations Improve Crosslingual Transfer , 2019, NAACL.

[2]  Eleftherios Mylonakis,et al.  Google trends: a web-based tool for real-time surveillance of disease outbreaks. , 2009, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[3]  Henry Lieberman,et al.  Common Sense Reasoning for Detection, Prevention, and Mitigation of Cyberbullying , 2012, TIIS.

[4]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[5]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Franziska Wulf,et al.  Kashmir Roots Of Conflict Paths To Peace , 2016 .

[8]  Jacob Eisenstein,et al.  You Can't Stay Here , 2017 .

[9]  Iffat Malik Kashmir: Ethnic Conflict, International Dispute , 2002 .

[10]  Ismail Hakki Toroslu,et al.  Sentiment Analysis of Turkish Political News , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[11]  Jure Leskovec,et al.  Inducing Domain-Specific Sentiment Lexicons from Unlabeled Corpora , 2016, EMNLP.

[12]  Ponnurangam Kumaraguru,et al.  Mind Your Language: Abuse and Offense Detection for Code-Switched Languages , 2018, AAAI.

[13]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[14]  Kim,et al.  Social conflict: Escalation, stalemate, and settlement , 1986 .

[15]  Sasha Blair-Goldensohn,et al.  The viability of web-derived polarity lexicons , 2010, NAACL.

[16]  Ilya Sutskever,et al.  Learning to Generate Reviews and Discovering Sentiment , 2017, ArXiv.

[17]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[18]  Serkan Ayvaz,et al.  Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis , 2018, Telematics Informatics.

[19]  S. Stephens-Davidowitz Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are , 2017 .

[20]  Thomas Zeitzoff,et al.  How Social Media Is Changing Conflict , 2017 .

[21]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[22]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[23]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[24]  Vikas Sindhwani,et al.  Uncertainty sampling and transductive experimental design for active dual supervision , 2009, ICML '09.

[25]  Ping Liu,et al.  Forecasting the presence and intensity of hostility on Instagram using linguistic and social features , 2018, ICWSM.

[26]  Cyril Pickard,et al.  Can Pakistan survive? The death of a state , 1983 .

[27]  Russell J. Leng,et al.  Realpolitik and the Road to War: An Analysis of Attributes and Behavior , 1983 .

[28]  Victoria Schofield,et al.  Kashmir in Conflict: India, Pakistan and the Unending War , 2000 .

[29]  Felice Dell'Orletta,et al.  Hate Me, Hate Me Not: Hate Speech Detection on Facebook , 2017, ITASEC.

[30]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[31]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[32]  Paul Staniland,et al.  Kashmir since 2003: Counterinsurgency and the Paradox of “Normalcy” , 2013 .

[33]  Vinay Singh,et al.  A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech Detection , 2018, PEOPLES@NAACL-HTL.