Calliope: Automatic Visual Data Story Generation from a Spreadsheet

Visual data stories shown in the form of narrative visualizations such as a poster or a data video, are frequently used in data-oriented storytelling to facilitate the understanding and memorization of the story content. Although useful, technique barriers, such as data analysis, visualization, and scripting, make the generation of a visual data story difficult. Existing authoring tools rely on users' skills and experiences, which are usually inefficient and still difficult. In this paper, we introduce a novel visual data story generating system, Calliope, which creates visual data stories from an input spreadsheet through an automatic process and facilities the easy revision of the generated story based on an online story editor. Particularly, Calliope incorporates a new logic-oriented Monte Carlo tree search algorithm that explores the data space given by the input spreadsheet to progressively generate story pieces (i.e., data facts) and organize them in a logical order. The importance of data facts is measured based on information theory, and each data fact is visualized in a chart and captioned by an automatically generated description. We evaluate the proposed technique through three example stories, two controlled experiments, and a series of interviews with 10 domain experts. Our evaluation shows that Calliope is beneficial to efficient visual data story generation.

[1]  Dongyan Zhao,et al.  Plan-And-Write: Towards Better Automatic Storytelling , 2018, AAAI.

[2]  Hao Zhou,et al.  Variational Template Machine for Data-to-Text Generation , 2020, ICLR.

[3]  M. Sheelagh T. Carpendale,et al.  More Than Telling a Story: Transforming Data into Visually Shared Stories , 2015, IEEE Computer Graphics and Applications.

[4]  Danyang Liu,et al.  A Transformer-Based Variational Autoencoder for Sentence Generation , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[5]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[6]  Robert Michael Young,et al.  Narrative Planning: Balancing Plot and Character , 2010, J. Artif. Intell. Res..

[7]  Niklas Elmqvist,et al.  DataSite: Proactive visual data exploration with computation of insight-based recommendations , 2018, Inf. Vis..

[8]  Tobias Höllerer,et al.  ChartAccent: Annotation for data-driven storytelling , 2017, 2017 IEEE Pacific Visualization Symposium (PacificVis).

[9]  Man Lung Yiu,et al.  Extracting Top-K Insights from Multi-dimensional Data , 2017, SIGMOD Conference.

[10]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[11]  James Pustejovsky,et al.  Classification of Discourse Coherence Relations: An Exploratory Study using Multiple Knowledge Sources , 2006, SIGDIAL Workshop.

[12]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[13]  Alexander Lex,et al.  From Visual Exploration to Storytelling and Back Again , 2016, bioRxiv.

[14]  Edward Gibson,et al.  Representing Discourse Coherence: A Corpus-Based Study , 2005, CL.

[15]  Yong Xu,et al.  QuickInsights: Quick and Automatic Discovery of Insights from Multi-Dimensional Data , 2019, SIGMOD Conference.

[16]  Tim Kraska,et al.  VizML: A Machine Learning Approach to Visualization Recommendation , 2018, CHI.

[17]  Sivaji Bandyopadhyay,et al.  Statistical Natural Language Generation from Tabular Non-textual Data , 2016, INLG.

[18]  Bongshin Lee,et al.  Authoring Data-Driven Videos with DataClips , 2017, IEEE Transactions on Visualization and Computer Graphics.

[19]  Mitesh M. Khapra,et al.  A Mixed Hierarchical Attention Based Encoder-Decoder Approach for Standard Table Summarization , 2018, NAACL.

[20]  Nan Chen,et al.  Task-Oriented Optimal Sequencing of Visualization Charts , 2019, 2019 IEEE Visualization in Data Science (VDS).

[21]  Stefano Bromuri,et al.  Multi-Dimensional Causal Discovery , 2013, IJCAI.

[22]  Yun Wang,et al.  Text-to-Viz: Automatic Generation of Infographics from Proportion-Related Natural Language Statements , 2019, IEEE Transactions on Visualization and Computer Graphics.

[23]  Jie Li,et al.  Supporting Story Synthesis: Bridging the Gap between Visual Analytics and Storytelling , 2020, IEEE Transactions on Visualization and Computer Graphics.

[24]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[25]  Ehud Reiter,et al.  Generating Approximate Geographic Descriptions , 2009, ENLG.

[26]  Reid Swanson,et al.  Say Anything: Using Textual Case-Based Reasoning to Enable Open-Domain Interactive Storytelling , 2012, TIIS.

[27]  Bongshin Lee,et al.  A Deeper Understanding of Sequence in Narrative Visualization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[28]  Boyang Li,et al.  Story Generation with Crowdsourced Plot Graphs , 2013, AAAI.

[29]  Mark O. Riedl,et al.  Event Representations for Automated Story Generation with Deep Neural Nets , 2017, AAAI.

[30]  Zhen Li,et al.  Narvis: Authoring Narrative Slideshows for Introducing Data Visualization Designs , 2019, IEEE Transactions on Visualization and Computer Graphics.

[31]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[32]  Jeffrey Heer,et al.  Formalizing Visualization Design Knowledge as Constraints: Actionable and Extensible Models in Draco , 2018, IEEE Transactions on Visualization and Computer Graphics.

[33]  Arvind Satyanarayan,et al.  Critical Reflections on Visualization Authoring Systems , 2019, IEEE Transactions on Visualization and Computer Graphics.

[34]  Gennady L. Andrienko,et al.  Steering data quality with visual analytics: The complexity challenge , 2018, Vis. Informatics.

[35]  Jeffrey Heer,et al.  Narrative Visualization: Telling Stories with Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[36]  Çagatay Demiralp,et al.  Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Networks , 2018, IEEE Computer Graphics and Applications.

[37]  Ion Androutsopoulos,et al.  Generating Multilingual Descriptions from Linguistically Annotated OWL Ontologies: the NaturalOWL System , 2007, ENLG.

[38]  William Ribarsky,et al.  Toward effective insight management in visual analytics systems , 2009, 2009 IEEE Pacific Visualization Symposium.

[39]  Arvind Satyanarayan,et al.  Authoring Narrative Visualizations with Ellipsis , 2014, Comput. Graph. Forum.

[40]  Jie Lu,et al.  HARVEST: an intelligent visual analytic tool for the masses , 2010 .

[41]  Hanspeter Pfister,et al.  Beyond Memorability: Visualization Recognition and Recall , 2016, IEEE Transactions on Visualization and Computer Graphics.

[42]  Robert S. Laramee,et al.  Storytelling and Visualization: An Extended Survey , 2018, Inf..

[43]  Dongyan Zhao,et al.  Learning to Write Stories with Thematic Consistency and Wording Novelty , 2019, AAAI.

[44]  Yun Wang,et al.  DataShot: Automatic Generation of Fact Sheets from Tabular Data , 2020, IEEE Transactions on Visualization and Computer Graphics.

[45]  Peter J. Haas,et al.  Foresight: Recommending Visual Insights , 2017, Proc. VLDB Endow..

[46]  Xu Sun,et al.  A Skeleton-Based Model for Promoting Coherence Among Sentences in Narrative Story Generation , 2018, EMNLP.

[47]  Bongshin Lee,et al.  Timeline Storyteller: The Design & Deployment of an Interactive Authoring Tool for Expressive Timeline Narratives , 2018 .

[48]  Chin-Yew Lin,et al.  Data2Text Studio: Automated Text Generation from Structured Data , 2018, EMNLP.

[49]  Aditya G. Parameswaran,et al.  SeeDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics , 2015, Proc. VLDB Endow..

[50]  Xi Chen,et al.  InfoNice: Easy Creation of Information Graphics , 2018, CHI.

[51]  Anja Belz,et al.  Automatic generation of weather forecast texts using comprehensive probabilistic generation-space models , 2008, Natural Language Engineering.

[52]  Yann Dauphin,et al.  Strategies for Structuring Story Generation , 2019, ACL.

[53]  Guoliang Li,et al.  DeepEye: Towards Automatic Data Visualization , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[54]  Anirban Laha,et al.  Storytelling from Structured Data and Knowledge Graphs : An NLG Perspective , 2019, ACL.

[55]  Alex Endert,et al.  Augmenting Visualizations with Interactive Data Facts to Facilitate Interpretation and Communication , 2019, IEEE Transactions on Visualization and Computer Graphics.

[56]  Habib Ramezani,et al.  A Note on the Normalized Definition of Shannon’s Diversity Index in Landscape Pattern Analysis , 2012 .

[57]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .