Classification of Negative Information on Socially Significant Topics in Mass Media

Mass media not only reflect the activities of state bodies but also shape the informational context, sentiment, depth, and significance level attributed to certain state initiatives and social events. Multilateral and quantitative (to the practicable extent) assessment of media activity is important for understanding their objectivity, role, focus, and, ultimately, the quality of the society’s “fourth power”. The paper proposes a method for evaluating the media in several modalities (topics, evaluation criteria/properties, classes), combining topic modeling of the text corpora and multiple-criteria decision making. The evaluation is based on an analysis of the corpora as follows: the conditional probability distribution of media by topics, properties, and classes is calculated after the formation of the topic model of the corpora. Several approaches are used to obtain weights that describe how each topic relates to each evaluation criterion/property and to each class described in the paper, including manual high-level labeling, a multi-corpora approach, and an automatic approach. The proposed multi-corpora approach suggests assessment of corpora topical asymmetry to obtain the weights describing each topic’s relationship to a certain criterion/property. These weights, combined with the topic model, can be applied to evaluate each document in the corpora according to each of the considered criteria and classes. The proposed method was applied to a corpus of 804,829 news publications from 40 Kazakhstani sources published from 01 January 2018 to 31 December 2019, to classify negative information on socially significant topics. A BigARTM model was derived (200 topics) and the proposed model was applied, including to fill a table of the analytical hierarchical process (AHP) and all of the necessary high-level labeling procedures. Experiments confirm the general possibility of evaluating the media using the topic model of the text corpora, because an area under receiver operating characteristics curve (ROC AUC) score of 0.81 was achieved in the classification task, which is comparable with results obtained for the same task by applying the BERT (Bidirectional Encoder Representations from Transformers) model.

[1]  Vikram Garaniya,et al.  Developing a novel risk-based methodology for multi-criteria decision making in marine renewable energy applications , 2017 .

[2]  Prasanta Kumar Dey,et al.  A decision support system for supplier selection and order allocation in stochastic, multi-stakeholder and multi-criteria environments , 2015 .

[3]  Moncef Abbas,et al.  Towards a New Approach for Disambiguation in NLP by Multiple Criterian Decision-Aid , 2011, Prague Bull. Math. Linguistics.

[4]  Kirill Yakunin,et al.  Multi-Criteria Spatial Decision Making Supportsystem for Renewable Energy Development in Kazakhstan , 2019, IEEE Access.

[5]  Paul J. Kennedy,et al.  An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit , 2020, Inf. Process. Manag..

[6]  Konstantin Vorontsov,et al.  Regularization, robustness and sparsity of probabilistic topic models , 2012 .

[7]  Gwo-Hshiung Tzeng,et al.  Extended VIKOR method in comparison with outranking methods , 2007, Eur. J. Oper. Res..

[8]  Nikolay Laptev,et al.  Digital Psychological Platform for Mass Web-Surveys , 2020, Data.

[9]  In Seop Na,et al.  Human-machine interaction: A case study on fake news detection using a backtracking based on a cognitive system , 2019, Cognitive Systems Research.

[10]  C. Hwang,et al.  TOPSIS for MODM , 1994 .

[11]  Ronald R. Yager,et al.  On ordered weighted averaging aggregation operators in multicriteria decisionmaking , 1988, IEEE Trans. Syst. Man Cybern..

[12]  Thomas Wanderer,et al.  Creating a spatial multi-criteria decision support system for energy related integrated environmental impact assessment , 2015 .

[13]  Kirill Yakunin,et al.  The design of the structure of the software system for processing text document corpus , 2019 .

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Stuart J. Barnes,et al.  Mining meaning from online ratings and reviews: Tourist satisfaction analysis using latent dirichlet allocation , 2017 .

[16]  Tom Willaert,et al.  Building Social Media Observatories for Monitoring Online Opinion Dynamics , 2020, Social Media + Society.

[17]  P. Vincke,et al.  Note-A Preference Ranking Organisation Method: The PROMETHEE Method for Multiple Criteria Decision-Making , 1985 .

[18]  F. Neresini,et al.  Can media monitoring be a proxy for public opinion about technoscientific controversies? The case of the Italian public debate on nuclear power , 2016, Public understanding of science.

[19]  Edmundas Kazimieras Zavadskas,et al.  Sustainable and Renewable Energy: An Overview of the Application of Multiple Criteria Decision Making Techniques and Approaches , 2015 .

[20]  P. Hansen,et al.  A new method for scoring additive multi‐attribute value models using pairwise rankings of alternatives , 2008 .

[21]  Arvind R. Singh,et al.  A review of multi criteria decision making (MCDM) towards sustainable renewable energy development , 2017 .

[22]  Adel Gastli,et al.  PV site suitability analysis using GIS-based spatial fuzzy multi-criteria evaluation , 2011 .