DataTales: Investigating the use of Large Language Models for Authoring Data-Driven Articles

Authoring data-driven articles is a complex process requiring authors to not only analyze data for insights but also craft a cohesive narrative that effectively communicates the insights. Text generation capabilities of contemporary large language models (LLMs) present an opportunity to assist the authoring of data-driven articles and expedite the writing process. In this work, we investigate the feasibility and perceived value of leveraging LLMs to support authors of data-driven articles. We designed a prototype system, DataTales, that leverages a LLM to generate textual narratives accompanying a given chart. Using DataTales as a design probe, we conducted a qualitative study with 11 professionals to evaluate the concept, from which we distilled affordances and opportunities to further integrate LLMs as valuable data-driven article authoring assistants.

[1]  Quansen Wang,et al.  Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness , 2023, ArXiv.

[2]  Lydia B. Chilton,et al.  AngleKindling: Supporting Journalistic Angle Ideation with Large Language Models , 2023, CHI.

[3]  Yining Cao,et al.  DataParticles: Block-based and Language-oriented Authoring of Animated Unit Visualizations , 2023, CHI.

[4]  P. Maes,et al.  Don’t Just Tell Me, Ask Me: AI Systems that Intelligently Frame Explanations as Questions Improve Human Logical Discernment Accuracy over Causal AI explanations , 2023, CHI.

[5]  Q. Liao,et al.  Why is AI not a Panacea for Data Workers? An Interview Study on Human-AI Collaboration in Data Storytelling , 2023, ArXiv.

[6]  Toby Jia-Jun Li,et al.  VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping , 2023, UIST.

[7]  James R. Glass,et al.  Interpretable Unified Language Checking , 2023, ArXiv.

[8]  S. Chu,et al.  A systematic review of artificial intelligence technologies used for story writing , 2023, Education and Information Technologies.

[9]  Yun Wang,et al.  Notable: On-the-fly Assistant for Data Storytelling in Computational Notebooks , 2023, CHI.

[10]  Daniel Buschek,et al.  Choice Over Control: How Users Write with Large Language Models using Diegetic and Non-Diegetic Prompting , 2023, CHI.

[11]  Douglas C. Schmidt,et al.  A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT , 2023, ArXiv.

[12]  A. Borji A Categorical Archive of ChatGPT Failures , 2023, ArXiv.

[13]  K. Mathewson,et al.  Co-Writing Screenplays and Theatre Scripts with Language Models: Evaluation by Industry Professionals , 2022, CHI.

[14]  Weiwei Cui,et al.  Erato: Cooperative Data Story Editing via Fact Interpolation , 2022, IEEE Transactions on Visualization and Computer Graphics.

[15]  Yun Wang,et al.  Towards Natural Language-Based Visualization Authoring , 2022, IEEE Transactions on Visualization and Computer Graphics.

[16]  D. Buschek,et al.  Beyond Text Generation: Supporting Writers with Continuous Automatic Text Summaries , 2022, UIST.

[17]  Elena L. Glassman,et al.  A Selective Summary of Where to Hide a Stolen Elephant: Leaps in Creative Writing with Multimodal Machine Intelligence , 2022, IN2WRITING.

[18]  Lydia B. Chilton,et al.  Sparks: Inspiration for Science Writing using Language Models , 2021, IN2WRITING.

[19]  Jeffrey Heer,et al.  Idyll Studio: A Structured Editor for Authoring Interactive & Data-Driven Articles , 2021, UIST.

[20]  Arvind Satyanarayan,et al.  Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content , 2021, IEEE Transactions on Visualization and Computer Graphics.

[21]  Leixian Shen,et al.  Towards Natural Language Interfaces for Data Visualization: A Survey , 2021, IEEE Transactions on Visualization and Computer Graphics.

[22]  Fabian Beck,et al.  Kori: Interactive Synthesis of Text and Charts in Data Documents , 2021, IEEE Transactions on Visualization and Computer Graphics.

[23]  Zhicheng Liu,et al.  Leveraging Text-Chart Links to Support Authoring of Data-Driven Articles with VizFlow , 2021, CHI.

[24]  Enamul Hoque,et al.  Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model , 2020, INLG.

[25]  Yang Shi,et al.  Calliope: Automatic Visual Data Story Generation from a Spreadsheet , 2020, IEEE Transactions on Visualization and Computer Graphics.

[26]  John Stasko,et al.  NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries , 2020, IEEE Transactions on Visualization and Computer Graphics.

[27]  Yun Wang,et al.  DataShot: Automatic Generation of Fact Sheets from Tabular Data , 2020, IEEE Transactions on Visualization and Computer Graphics.

[28]  Younghoon Kim,et al.  Assessing Effects of Task and Data Distribution on the Effectiveness of Visual Encodings , 2018, Comput. Graph. Forum.

[29]  Alex Endert,et al.  Task-Based Effectiveness of Basic Visualizations , 2017, IEEE Transactions on Visualization and Computer Graphics.

[30]  Tobias Höllerer,et al.  ChartAccent: Annotation for data-driven storytelling , 2017, 2017 IEEE Pacific Visualization Symposium (PacificVis).

[31]  Bongshin Lee,et al.  Emerging and Recurring Data-Driven Storytelling Techniques: Analysis of a Curated Collection of Recent Stories , 2016 .

[32]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[33]  N. Riche,et al.  More Than Telling a Story: Transforming Data into Visually Shared Stories , 2015, IEEE Computer Graphics and Applications.

[34]  Maneesh Agrawala,et al.  Extracting references between text and charts via crowdsourcing , 2014, CHI.

[35]  Cynthia L. King Reverse Outlining: A Method for Effective Revision of Document Structure , 2012, IEEE Transactions on Professional Communication.

[36]  Jeffrey Heer,et al.  Narrative Visualization: Telling Stories with Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[37]  Jeffery. M. Zacks,et al.  Bars and lines: A study of graphic communication , 1999, Memory & cognition.

[38]  Alex Endert,et al.  Augmenting Visualizations with Interactive Data Facts to Facilitate Interpretation and Communication , 2019, IEEE Transactions on Visualization and Computer Graphics.

[39]  Fabian Beck,et al.  Authoring Combined Textual and Visual Descriptions of Graph Data , 2019, EuroVis.

[40]  Fabian Beck,et al.  Exploring Interactive Linking Between Text and Visualization , 2018, EuroVis.