Automatically quantifying the scientific quality and sensationalism of news records mentioning pandemics: validating a maximum entropy machine-learning model

Abstract Objective To develop and validate a method for automatically quantifying the scientific quality and sensationalism of individual news records. Study design After retrieving 163,433 news records mentioning the Severe Acute Respiratory Syndrome (SARS) and H1N1 pandemics, a maximum entropy model for inductive machine learning was used to identify relationships among 500 randomly sampled news records that correlated with systematic human assessments of their scientific quality and sensationalism. These relationships were then computationally applied to automatically classify 10,000 additional randomly sampled news records. The model was validated by randomly sampling 200 records and comparing human assessments of them to the computer assessments. Results The computer model correctly assessed the relevance of 86% of news records, the quality of 65% of records, and the sensationalism of 73% of records, as compared to human assessments. Overall, the scientific quality of SARS and H1N1 news media coverage had potentially important shortcomings, but coverage was not too sensationalizing. Coverage slightly improved between the two pandemics. Conclusion Automated methods can evaluate news records faster, cheaper, and possibly better than humans. The specific procedure implemented in this study can at the very least identify subsets of news records that are far more likely to have particular scientific and discursive qualities.

[1]  Malhar Anjaria,et al.  A novel sentiment analysis of social networks using supervised learning , 2014, Social Network Analysis and Mining.

[2]  Petroc Sumner,et al.  The association between exaggeration in health related science news and academic press releases: retrospective observational study , 2014, BMJ : British Medical Journal.

[3]  T. Bubela,et al.  Do the print media “hype” genetic research? A comparison of newspaper stories and peer-reviewed research papers , 2004, Canadian Medical Association Journal.

[4]  Andrew Laing The H1N1 crisis: Roles played by government communicators, the public and the media , 2011 .

[5]  D Charnock,et al.  DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. , 1999, Journal of epidemiology and community health.

[6]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[7]  Alastair Baker,et al.  Crossing the Quality Chasm: A New Health System for the 21st Century , 2001, BMJ : British Medical Journal.

[8]  Katarzyna Molek-Kozakowska Towards a pragma-linguistic framework for the study of sensationalism in news headlines , 2013 .

[9]  Stephen D. Prior,et al.  Pandemic Influenza Preparedness: Adaptive Responses to an Evolving Challenge , 2006 .

[10]  Percy H. Tannenbaum,et al.  Sensationalism: The Concept and its Measurement , 1960 .

[11]  L. Soot,et al.  Vascular surgery and the Internet: a poor source of patient-oriented information. , 1999, Journal of vascular surgery.

[12]  John Mount,et al.  The equivalence of logistic regression and maximum entropymodels , 2011 .

[13]  James A. Gross LexisNexis Academic , 2005 .

[14]  S B Soumerai,et al.  Coverage by the news media of the benefits and risks of medications. , 2000, The New England journal of medicine.

[15]  Justin Grimmer,et al.  Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts , 2013, Political Analysis.

[16]  Tilo Hartmann,et al.  Swine flu and hype: a systematic review of media dramatization of the H1N1 influenza pandemic , 2016 .

[17]  S. Hoffman The Evolution, Etiology and Eventualities of the Global Health Security Regime , 2010, Health policy and planning.

[18]  Rob Malouf,et al.  A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.

[19]  Robert C Hornik,et al.  Use of mass media campaigns to change health behaviour , 2010, The Lancet.

[20]  Allerd Peeters,et al.  Explaining Effects of Sensationalism on Liking of Television News Stories , 2008, Commun. Res..

[21]  A. L. Otten The influence of the mass media on health policy. , 1992, Health affairs.

[22]  C. Burgers,et al.  Language intensity as a sensationalistic news feature: The influence of style on sensationalism perceptions and effects , 2013 .

[23]  J. Pirkis,et al.  The relationship between media reporting of suicide and actual suicide in Australia. , 2006, Social science & medicine.

[24]  C. Ramsay,et al.  Mass media interventions: effects on health services utilisation. , 2002, The Cochrane database of systematic reviews.

[25]  Roger Brownsword,et al.  Code, control, and choice: why East is East and West is West , 2005, Legal Studies.

[26]  Adam L. Berger,et al.  A Maximum Entropy Approach to Natural Language Processing , 1996, CL.

[27]  G. Guyatt,et al.  Validation of an index of the quality of review articles. , 1991, Journal of clinical epidemiology.

[28]  J. Powell,et al.  Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. , 2002, JAMA.

[29]  D. Ransohoff,et al.  Sensationalism in the media: when scientists and journalists may be complicit collaborators. , 2001, Effective clinical practice : ECP.

[30]  K. Swain Outrage Factors and Explanations in News Coverage of the Anthrax Attacks , 2007 .

[31]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[32]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[33]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[34]  A D Oxman,et al.  An index of scientific quality for health reports in the lay press. , 1993, Journal of clinical epidemiology.

[35]  M. Peiris,et al.  International Health Regulations (2005) , 2005, The Lancet.

[36]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[37]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[38]  Maria Elizabeth Grabe,et al.  Explicating Sensationalism in Television News: Content and the Bells and Whistles of Form , 2001 .

[39]  Margaret Chan,et al.  A safer future : global public health security in the 21st century , 2007 .

[40]  M. Voracek,et al.  Role of media reports in completed and prevented suicide: Werther v. Papageno effects. , 2010, The British journal of psychiatry : the journal of mental science.