Words of Estimative Correlation: Studying Verbalizations of Scatterplots

Natural language and visualization are being increasingly deployed together for supporting data analysis in different ways, from multimodal interaction to enriched data summaries and insights. Yet, researchers still lack systematic knowledge on how viewers verbalize their interpretations of visualizations, and how they interpret verbalizations of visualizations in such contexts. We describe two studies aimed at identifying characteristics of data and charts that are relevant in such tasks. The first study asks participants to verbalize what they see in scatterplots that depict various levels of correlations. The second study then asks participants to choose visualizations that match a given verbal description of correlation. We extract key concepts from responses, organize them in a taxonomy and analyze the categorized responses. We observe that participants use a wide range of vocabulary across all scatterplots, but particular concepts are preferred for higher levels of correlation. A comparison between the studies reveals the ambiguity of some of the concepts. We discuss how the results could inform the design of multimodal representations aligned with the data and analytical tasks, and present a research roadmap to deepen the understanding about visualizations and natural language.

[1]  Siddhartha Jonnalagadda,et al.  Towards a semantic lexicon for clinical natural language processing , 2012, AMIA.

[2]  Vidya Setlur,et al.  Applying Pragmatics Principles for Interaction with Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[3]  Vidya Setlur,et al.  Eviza: A Natural Language Interface for Visual Analysis , 2016, UIST.

[4]  Anshul Vikram Pandey,et al.  Towards Understanding Human Similarity Perception in the Analysis of Large Sets of Scatter Plots , 2016, CHI.

[5]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[6]  Fabian Beck,et al.  Interactive map reports summarizing bivariate geographic data , 2019, Vis. Informatics.

[7]  Michael S. Bernstein,et al.  Iris: A Conversational Agent for Complex Tasks , 2017, CHI.

[8]  Fabian Beck,et al.  Exploranative Code Quality Documents , 2019, IEEE Transactions on Visualization and Computer Graphics.

[9]  James R. Glass,et al.  A collective data generation method for speech language models , 2010, 2010 IEEE Spoken Language Technology Workshop.

[10]  Scott Barclay,et al.  Handbook for Decision Analysis , 1977 .

[11]  Emiel Krahmer,et al.  Computational Generation of Referring Expressions: A Survey , 2012, CL.

[12]  V. Braun,et al.  Using thematic analysis in psychology , 2006 .

[13]  Trevor Hastie,et al.  An Introduction to Statistical Learning , 2013, Springer Texts in Statistics.

[14]  Mary Hegarty,et al.  Correction to: Decision making with visualizations: a cognitive framework across disciplines , 2018, Cognitive Research: Principles and Implications.

[15]  Dan Bohus,et al.  Crowdsourcing Language Generation Templates for Dialogue Systems , 2014, INLG.

[16]  John T. Stasko,et al.  Natural Language Interfaces for Data Analysis with Visualization: Considering What Has and Could Be Asked , 2017, EuroVis.

[17]  Kees van Deemter,et al.  Generating Expressions that Refer to Visible Objects , 2013, NAACL.

[18]  Ilaria Liccardi,et al.  An Empirical Study on the Reliability of Perceiving Correlation Indices using Scatterplots , 2017, Comput. Graph. Forum.

[19]  Philip Bobko,et al.  THE PERCEPTION OF PEARSON PRODUCT MOMENT CORRELATIONS FROM BIVARIATE SCATTERPLOTS , 1979 .

[20]  I. Pollack,et al.  Identification of visual correlational scatterplots. , 1960, Journal of experimental psychology.

[21]  Jan Hauke,et al.  Comparison of Values of Pearson's and Spearman's Correlation Coefficients on the Same Sets of Data , 2011 .

[22]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[23]  Daniel A. Keim,et al.  Going beyond Visualization. Verbalization as Complementary Medium to Explain Machine Learning Models , 2018 .

[24]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[25]  H. Turunen,et al.  Content analysis and thematic analysis: Implications for conducting a qualitative descriptive study. , 2013, Nursing & health sciences.

[26]  John T. Stasko,et al.  Orko: Facilitating Multimodal Interaction for Visual Exploration and Analysis of Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[27]  Marti Hearst,et al.  Toward Interface Defaults for Vague Modifiers in Natural Language Interfaces for Visual Analysis , 2019, 2019 IEEE Visualization Conference (VIS).

[28]  James M. Keller,et al.  Textual summarization of events leading to health alerts , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[29]  Raquel Hervás,et al.  Assessing the influence of personal preferences on the choice of vocabulary for natural language generation , 2013, Inf. Process. Manag..

[30]  Yiwen Sun,et al.  Articulate: A Semi-automated Model for Translating Natural Language Queries into Meaningful Visualizations , 2010, Smart Graphics.

[31]  Jeffrey Heer,et al.  Color Names Across Languages: Salient Colors and Term Translation in Multilingual Color Naming Models , 2019, EuroVis.

[32]  Michèle Basseville,et al.  Divergence measures for statistical data processing - An annotated bibliography , 2013, Signal Process..

[33]  Jeffrey Heer,et al.  Regression by Eye: Estimating Trends in Bivariate Visualizations , 2017, CHI.

[34]  Marti Hearst,et al.  Would You Like A Chart With That? Incorporating Visualizations into Conversational Interfaces , 2019, 2019 IEEE Visualization Conference (VIS).

[35]  Karrie Karahalios,et al.  DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization , 2015, UIST.

[36]  R. Grossman,et al.  Graph-theoretic scagnostics , 2005, IEEE Symposium on Information Visualization, 2005. INFOVIS 2005..

[37]  Bowen Yu,et al.  FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System , 2019, IEEE Transactions on Visualization and Computer Graphics.

[38]  Ronald A. Rensink,et al.  The Perception of Correlation in Scatterplots , 2010, Comput. Graph. Forum.

[39]  Jeffrey Heer,et al.  Beyond Weber's Law: A Second Look at Ranking Visualizations of Correlation , 2016, IEEE Transactions on Visualization and Computer Graphics.

[40]  Kevin Bretonnel Andrew Cohen,et al.  Foundations of Statistical Natural Language Processing (review) , 2002 .

[41]  Virginia Braun,et al.  Thematic analysis , 2017 .

[42]  Jeffrey Heer,et al.  Color naming models for color selection, image editing and palette design , 2012, CHI.

[43]  Wouter Meulemans,et al.  Map LineUps: Effects of spatial structure on graphical inference , 2017, IEEE Transactions on Visualization and Computer Graphics.

[44]  M. Sheelagh T. Carpendale,et al.  Pre-design empiricism for information visualization: scenarios, methods, and challenges , 2014, BELIV '14.

[45]  Alex Endert,et al.  Augmenting Visualizations with Interactive Data Facts to Facilitate Interpretation and Communication , 2019, IEEE Transactions on Visualization and Computer Graphics.

[46]  M. Sheelagh T. Carpendale,et al.  Evaluating Information Visualizations , 2008, Information Visualization.

[47]  Eric Fosler-Lussier,et al.  Adjusting Word Embeddings with Semantic Intensity Orders , 2016, Rep4NLP@ACL.

[48]  Brian Roark,et al.  Noun-Phrase Co-Occurence Statistics for Semi-Automatic Semantic Lexicon Construction , 1998, COLING-ACL.

[49]  E. Reiter,et al.  Acquiring Correct Knowledge for Natural Language Generation , 2011, J. Artif. Intell. Res..

[50]  Steven Franconeri,et al.  Ranking Visualizations of Correlation Using Weber's Law , 2014, IEEE Transactions on Visualization and Computer Graphics.

[51]  Tamara Munzner,et al.  A Taxonomy of Visual Cluster Separation Factors , 2012, Comput. Graph. Forum.

[52]  Peter J. Haas,et al.  Foresight: Recommending Visual Insights , 2017, Proc. VLDB Endow..