A Baseline Document Planning Method for Automated Journalism

In this work, we present a method for content selection and document planning for automated news and report generation from structured statistical data such as that offered by the European Union’s statistical agency, EuroStat. The method is driven by the data and is highly topic-independent within the statistical dataset domain. As our approach is not based on machine learning, it is suitable for introducing news automation to the wide variety of domains where no training data is available. As such, it is suitable as a low-cost (in terms of implementation effort) baseline for document structuring prior to introduction of domain-specific knowledge.

[1]  Emiel Krahmer,et al.  Neural data-to-text generation: A comparison between pipeline and end-to-end architectures , 2019, EMNLP.

[2]  Alexander M. Rush,et al.  Challenges in Data-to-Document Generation , 2017, EMNLP.

[3]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[4]  “Objectivity” and “hard news” reporting across cultures: comparing the news report in English, French, Japanese and Indonesian journalism , 2020 .

[5]  C. Linden,et al.  Decades of Automation in the Newsroom , 2017 .

[6]  Siobhan Chapman Logic and Conversation , 2005 .

[7]  Ehud Reiter,et al.  An Architecture for Data-to-Text Systems , 2007, ENLG.

[8]  Kathleen F. McCoy,et al.  A Discourse-Aware Graph-Based Content-Selection Framework , 2010, INLG.

[9]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[10]  Hannu Toivonen,et al.  Finding and expressing news from structured data , 2017, MindTrek.

[11]  Laurence,et al.  Why News Automation Fails , 2019 .

[12]  Chris Mellish,et al.  Experiments Using Stochastic Search for Text Planning , 1998, INLG.

[13]  Mirella Lapata,et al.  Data-to-Text Generation with Content Selection and Planning , 2018, AAAI.

[14]  Horst Po¨ttker News and its communicative quality: the inverted pyramid—when and why did it appear? , 2003 .

[15]  Xiaojun Wan,et al.  Point Precisely: Towards Ensuring the Precision of Data in Generated Texts Using Delayed Copy Mechanism , 2018, COLING.

[16]  Hannu Toivonen,et al.  Data-Driven News Generation for Automated Journalism , 2017, INLG.

[17]  Andreas Graefe,et al.  Guide to automated journalism , 2016 .

[18]  Albert Gatt,et al.  From data to text in the Neonatal Intensive Care Unit: Using NLG technology for decision support and information management , 2009, AI Commun..

[19]  Mirella Lapata,et al.  Inducing Document Plans for Concept-to-Text Generation , 2013, EMNLP.

[20]  Dimitra Gkatzia,et al.  Content Selection in Data-to-Text Systems: A Survey , 2016, ArXiv.

[21]  Amy Loutfi,et al.  Towards NLG for Physiological Data Monitoring with Body Area Networks , 2013, ENLG.

[22]  R. Power,et al.  Summarisation and Visualisation of e-Health Data Repositories , 2005 .

[23]  Jim Hunter,et al.  Generating English summaries of time series data using the Gricean maxims , 2003, KDD '03.

[24]  Danqi Chen,et al.  Position-aware Attention and Supervised Data Improve Slot Filling , 2017, EMNLP.

[25]  Konstantin Dörr,et al.  Mapping the field of Algorithmic Journalism , 2016 .

[26]  Tapio Salakoski,et al.  Template-free Data-to-Text Generation of Finnish Sports News , 2019, NODALIDA.

[27]  Chin-Yew Lin,et al.  Data2Text Studio: Automated Text Generation from Structured Data , 2018, EMNLP.

[28]  Verena Rieser,et al.  Semantic Noise Matters for Neural Natural Language Generation , 2019, INLG.

[29]  Chin-Yew Lin,et al.  A Simple Recipe towards Reducing Hallucination in Neural Surface Realisation , 2019, ACL.

[30]  David Caswell,et al.  Automated Journalism 2.0: Event-driven narratives , 2018 .

[31]  Death , Disruption and the Moral Order : the narrative impulse in mass-media ‘ hard news ’ reporting , 2005 .

[32]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.