Characterising User Content on a Multi-lingual Social Network

Social media has been on the vanguard of political information diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understanding of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multicultural democracy: India. In this paper we present our characterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Telugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and therefore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting.

[1]  Taha Siddiqui,et al.  Indian social media politics: new era of election war , 2020 .

[2]  Panagiotis Papadopoulos,et al.  Stop tracking me Bro! Differential Tracking of User Demographics on Hyper-Partisan Websites , 2020, WWW.

[3]  Emiliano De Cristofaro,et al.  Challenges in the Decentralised Web: The Mastodon Case , 2019, ACM/SIGCOMM Internet Measurement Conference.

[4]  Jussara M. Almeida,et al.  WhatsApp Monitor: A Fact-Checking System for WhatsApp , 2019, ICWSM.

[5]  Scott A. Hale Net Increase? Cross-Lingual Linking in the Blogosphere , 2019, J. Comput. Mediat. Commun..

[6]  Nishanth R. Sastry,et al.  Tweeting MPs: Digital Engagement between Citizens and Members of Parliament in the UK , 2019, ICWSM.

[7]  Gianluca Stringhini,et al.  Who Let The Trolls Out?: Towards Understanding State-Sponsored Trolls , 2018, WebSci.

[8]  Lucie-Aimée Kaffee,et al.  Analysis of Editors' Languages in Wikidata , 2018, OpenSym.

[9]  Brent J. Hecht,et al.  The_Tower_of_Babel.jpg: Diversity of Visual Encyclopedic Knowledge Across Wikipedia Language Editions , 2018, ICWSM.

[10]  Gianluca Stringhini,et al.  Screenshot Classifier annotated images pHashes of non-screenshot annotated images Know Your Meme Generic Annotation Sites Meme Annotation Sites Generic Web Communities , 2018 .

[11]  Venkata Rama Kiran Garimella,et al.  WhatsApp, Doc? A First Look at WhatsApp Public Group Data , 2018, ICWSM 2018.

[12]  Shehar Bano,et al.  Illuminating an Ecosystem of Partisan Websites , 2018, WWW.

[13]  Nishanth R. Sastry,et al.  Facebook (A)Live?: Are Live Social Broadcasts Really Broadcasts? , 2018, WWW.

[14]  Gianluca Stringhini,et al.  Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web , 2018, WWW.

[15]  Kate Starbird,et al.  Examining the Alternative Media Ecosystem Through the Production of Alternative Narratives of Mass Shooting Events on Twitter , 2017, ICWSM.

[16]  Scott A. Hale User Reviews and Language: How Language Influences Ratings , 2016, CHI Extended Abstracts.

[17]  D. Murthy,et al.  Do We Tweet Differently From Our Mobile Devices? A Study of Language Differences on Mobile and Web‐Based Twitter Platforms , 2015 .

[18]  Dong Nguyen,et al.  Audience and the Use of Minority Languages on Twitter , 2015, ICWSM.

[19]  Steve Uhlig,et al.  Are People Really Social in Porn 2.0? , 2015, ICWSM.

[20]  Scott A. Hale Cross-language Wikipedia Editing of Okinawa, Japan , 2015, CHI.

[21]  Scott A. Hale Multilinguals and Wikipedia editing , 2013, WebSci '14.

[22]  Nishanth R. Sastry,et al.  IARank: Ranking Users on Twitter in Near Real-Time, Based on Their Information Amplification Potential , 2012, 2012 International Conference on Social Informatics.

[23]  P. Metaxas,et al.  Social Media and the Elections , 2012, Science.

[24]  Nishanth R. Sastry How To Tell Head From Tail in User-Generated Content Corpora , 2012, ICWSM.

[25]  Michael S. Horn,et al.  Omnipedia: bridging the wikipedia language gap , 2012, CHI.

[26]  Ed H. Chi,et al.  Language Matters In Twitter: A Large Scale Study , 2011, ICWSM.

[27]  Darren Gergle,et al.  The tower of Babel meets web 2.0: user-generated content and its applications in a multilingual context , 2010, CHI.

[28]  R. Guha India After Gandhi: The History of the World's Largest Democracy , 2007 .

[29]  M. A. Durbin Language Conflict and National Development: Group Politics and National Language Policy in India. JYOTIRINDRA DAS GUPTA , 1971 .

[30]  H. Rao The role of new media in political campaigns: A case study of social media campaigning for the 2019 general elections , 2019, Asian Journal of Multidimensional Research (AJMR).

[31]  Christoph Zauner,et al.  Implementation and Benchmarking of Perceptual Image Hash Functions , 2010 .

[32]  M. B. Emeneau India as a Lingustic Area , 1956 .