A Linguistic Approach to Misinformation in Chinese

Identifying useful information is increasingly important and difficult. Correct information is crucial in when we make our decisions, regardless in finance/economy, health and politics. Yet, the amount of misinformation has been rising in all these aspects. Existing works primarily focus on the truthfulness of information using data in English, and either ignore unverifiable claims or categorize them with misinformation (also known as ‘fake news’). However, this approach often disregards misleading information or conspiracy, which can be as dangerous as verifiably wrong information. From a linguistic perspective, the present study analyzes headlines of 69,170 extracted articles in Chinese and identifies their linguistic features. Results show that misinformation in Chinese use emotive language and hyperbole to get readers’ attention, which echoes previous studies on clickbaits and shows that these tactics in misinformation are shared across languages. We further argue that these tactics are particularly obvious, when the articles are categorized based on the topics. Through an analysis of commonly used phrases and keywords, we discuss how the word list can be further developed into an identification system for misinformation.

[1]  V. Uma,et al.  Ontology based knowledge representation technique, domain modeling languages and planners for robotic path planning: A survey , 2018, ICT Express.

[2]  JiangShan,et al.  Linguistic Signals under Misinformation and Fact-Checking , 2018 .

[3]  William Yang Wang “Liar, Liar Pants on Fire”: A New Benchmark Dataset for Fake News Detection , 2017, ACL.

[4]  Eunsol Choi,et al.  Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking , 2017, EMNLP.

[5]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[6]  Chu-Ren Huang,et al.  Motivations, Methods and Metrics of Misinformation Detection: An NLP Perspective , 2020, Natural Language Processing Research.

[7]  Alberto Acerbi,et al.  Cognitive attraction and online misinformation , 2018, Palgrave Communications.

[8]  Kai Shu,et al.  FakeNewsTracker: a tool for fake news collection, detection, and visualization , 2018, Computational and Mathematical Organization Theory.

[9]  A. Avramides Studies in the Way of Words , 1992 .

[10]  Bhavika Bhutani,et al.  Fake News Detection Using Sentiment Analysis , 2019, 2019 Twelfth International Conference on Contemporary Computing (IC3).

[11]  Huan Liu,et al.  FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media , 2018, Big Data.

[12]  Dinesh Kumar Vishwakarma,et al.  Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities , 2020, Expert Syst. Appl..

[13]  Fatemeh Torabi Asr,et al.  Big Data and quality data for fake news and misinformation detection , 2019, Big Data Soc..

[14]  James W. Pennebaker,et al.  Linguistic Inquiry and Word Count (LIWC2007) , 2007 .

[15]  Huan Liu,et al.  Hierarchical Propagation Networks for Fake News Detection: Investigation and Exploitation , 2019, ICWSM.

[16]  Keh-Jiann Chen,et al.  Reliable and Cost-Effective Pos-Tagging , 2003, Int. J. Comput. Linguistics Chin. Lang. Process..