Writing Strategies for Science Communication: Data and Computational Analysis

Communicating complex scientific ideas without misleading or overwhelming the public is challenging. While science communication guides exist, they rarely offer empirical evidence for how their strategies are used in practice. Writing strategies that can be automatically recognized could greatly support science communication efforts by enabling tools to detect and suggest strategies for writers. We compile a set of writing strategies drawn from a wide range of prescriptive sources and develop an annotation scheme allowing humans to recognize them. We collect a corpus of 128k science writing documents in English and annotate a subset of this corpus. We use the annotations to train transformer-based classifiers and measure the strategies’ use in the larger corpus. We find that the use of strategies, such as storytelling and emphasizing the most important findings, varies significantly across publications with different reader audiences.

[1]  Yejin Choi,et al.  Social Bias Frames: Reasoning about Social and Power Implications of Language , 2020, ACL.

[2]  Petroc Sumner,et al.  The association between exaggeration in health related science news and academic press releases: retrospective observational study , 2014, BMJ : British Medical Journal.

[3]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[4]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[5]  Eunsol Choi,et al.  Neural Metaphor Detection in Context , 2018, EMNLP.

[6]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[7]  Judith A. Holton,et al.  The Coding Process and Its Challenges , 2010 .

[8]  悠太 菊池,et al.  大規模要約資源としてのNew York Times Annotated Corpus , 2015 .

[9]  Thomas C Hayden,et al.  The Science Writers' Handbook: Everything You Need to Know to Pitch, Publish, and Prosper in the Digital Age , 2013 .

[10]  Kristian H. Nielsen,et al.  News Coverage of Climate Change in Nature News and ScienceNOW during 2007 , 2011 .

[11]  P. Hetland Models in Science Communication Policy , 2016 .

[12]  Dietrich Rebholz-Schuhmann,et al.  Automatic recognition of conceptualization zones in scientific articles and two life science applications , 2012, Bioinform..

[13]  Dietram A. Scheufele,et al.  What's next for science communication? Promising directions and lingering distractions. , 2009, American journal of botany.

[14]  Ani Nenkova,et al.  A corpus of science journalism for analyzing writing quality , 2013, Dialogue Discourse.

[15]  Mathieu Ranger,et al.  ‘The kind of mildly curious sort of science interested person like me’: Science bloggers’ practices relating to audience recruitment , 2014, Public understanding of science.

[16]  Joseph L. Fleiss,et al.  Measures of effect size for categorical data. , 1994 .

[17]  Alla Sheffer,et al.  Elements of style , 2015, ACM Trans. Graph..

[18]  Christoph Kueffer,et al.  Responsible Use of Language in Scientific Writing and Science Communication , 2014 .

[19]  Susan Stocklmayer,et al.  Science Communication: A Contemporary Definition , 2003 .

[20]  Mark Davies The 385+ million word Corpus of Contemporary American English (1990―2008+): Design, architecture, and linguistic insights , 2009 .

[21]  Dallas Card,et al.  The Importance of Calibration for Estimating Proportions from Annotations , 2018, NAACL.

[22]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[23]  Balaji Vasan Srinivasan,et al.  Sci-Blogger: A Step Towards Automated Science Journalism , 2018, CIKM.

[24]  Lillian Lee,et al.  A Corpus of Sentence-level Revisions in Academic Writing: A Step towards Understanding Statement Strength in Communication , 2014, ACL.

[25]  Roman Kern,et al.  A Study of Scientific Writing: Comparing Theoretical Guidelines with Practical Implementation , 2014, Proceedings of the COLING Workshop on Synchronic and Diachronic Approaches to Analyzing Technical Language.

[26]  Emily M. Bender,et al.  Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics , 2019, Linguistic Fundamentals for Natural Language Processing II.

[27]  Gary King,et al.  A Method of Automated Nonparametric Content Analysis for Social Science , 2010 .

[28]  Deborah Blum,et al.  A Field Guide for Science Writers , 1999 .

[29]  Brendan T. O'Connor,et al.  Posterior calibration and exploratory analysis for natural language processing models , 2015, EMNLP.

[30]  Mark Davies,et al.  A New Academic Vocabulary List , 2014 .

[31]  Lewis Bott,et al.  The association between exaggeration in health-related science news and academic press releases: a replication study , 2019, Wellcome open research.

[32]  Ani Nenkova,et al.  What Makes Writing Great? First Experiments on Article Quality Prediction in the Science Journalism Domain , 2013, TACL.

[33]  Simone Teufel,et al.  Corpora for the Conceptualisation and Zoning of Scientific Papers , 2010, LREC.

[34]  J. Gilbert,et al.  Identifying the Essential Elements of Effective Science Communication: What do the experts say? , 2012 .

[35]  Noah A. Smith,et al.  The Media Frames Corpus: Annotations of Frames Across Issues , 2015, ACL.

[36]  Maria Liakata,et al.  Identifying the Information Structure of Scientific Abstracts: An Investigation of Three Different Schemes , 2010, BioNLP@ACL.

[37]  Sumner The association between exaggeration in health related science news and academic press releases: retrospective observational study , 2014, BMJ : British Medical Journal.

[38]  Preslav Nakov,et al.  Fine-Grained Analysis of Propaganda in News Article , 2019, EMNLP.

[39]  A. Baram‐Tsabari,et al.  Automatic jargon identifier for scientists engaging with the public and science communication educators , 2017, PLoS ONE.

[40]  Horst Po¨ttker News and its communicative quality: the inverted pyramid—when and why did it appear? , 2003 .

[41]  José Hernández-Orallo,et al.  Quantification via Probability Estimators , 2010, 2010 IEEE International Conference on Data Mining.

[42]  Ali Farhadi,et al.  Defending Against Neural Fake News , 2019, NeurIPS.

[43]  Yejin Choi,et al.  Connotation Frames of Power and Agency in Modern Films , 2017, EMNLP.