Extracting references between text and charts via crowdsourcing

News articles, reports, blog posts and academic papers often include graphical charts that serve to visually reinforce arguments presented in the text. To help readers better understand the relation between the text and the chart, we present a crowdsourcing pipeline to extract the references between them. Specifically, we give crowd workers paragraph-chart pairs and ask them to select text phrases as well as the corresponding visual marks in the chart. We then apply automated clustering and merging techniques to unify the references generated by multiple workers into a single set. Comparing the crowdsourced references to a set of gold standard references using a distance measure based on the F1 score, we find that the average distance between the raw set of references produced by a single worker and the gold standard is 0.54 (out of a max of 1.0). When we apply clustering and merging techniques the average distance between the unified set of references and the gold standard reduces to 0.39; an improvement of 27%. We conclude with an interactive document viewing application that uses the extracted references; readers can select phrases in the text and the system highlights the related marks in the chart.

[1]  Alberto Cairo,et al.  The Functional Art: An introduction to information graphics and visualization , 2012 .

[2]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[3]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[4]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[5]  John Le,et al.  Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution , 2010 .

[6]  Eser Kandogan,et al.  Just-in-time annotation of clusters, outliers, and trends in point-based data visualizations , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[7]  Maneesh Agrawala,et al.  Graphical Overlays: Using Layered Elements to Aid Chart Reading , 2012, IEEE Transactions on Visualization and Computer Graphics.

[8]  Alexis Battle,et al.  The jabberwocky programming environment for structured social computing , 2011, UIST.

[9]  Nicholas Diakopoulos,et al.  Contextifier: automatic generation of annotated stock visualizations , 2013, CHI.

[10]  Douglas W. Oard,et al.  Building a Cross-Language Entity Linking Collection in Twenty-One Languages , 2011, CLEF.

[11]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[12]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[13]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[14]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[15]  Issei Fujishiro,et al.  The elements of graphing data , 2005, The Visual Computer.

[16]  Jeffrey Heer,et al.  Narrative Visualization: Telling Stories with Data , 2010, IEEE Transactions on Visualization and Computer Graphics.

[17]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[18]  Preslav Nakov Noun Compound Interpretation Using Paraphrasing Verbs: Feasibility Study , 2008, AIMSA.

[19]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[20]  Jeffrey Heer,et al.  ReVision: automated classification, analysis and redesign of chart images , 2011, UIST.

[21]  Jeffrey Heer,et al.  Strategies for crowdsourcing social data analysis , 2012, CHI.

[22]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[23]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[24]  Eytan Adar,et al.  The impact of social information on visual judgments , 2011, CHI.

[25]  Jeffrey Heer,et al.  Perceptual Guidelines for Creating Rectangular Treemaps , 2010, IEEE Transactions on Visualization and Computer Graphics.

[26]  Vikas Sindhwani,et al.  Data Quality from Crowdsourcing: A Study of Annotation Selection Criteria , 2009, HLT-NAACL 2009.

[27]  Michael S. Bernstein,et al.  Direct answers for search queries in the long tail , 2012, CHI.

[28]  Nicholas Diakopoulos,et al.  Visualization Rhetoric: Framing Effects in Narrative Visualization , 2011, IEEE Transactions on Visualization and Computer Graphics.

[29]  William S. Cleveland The elements of graphing data , 1980 .

[30]  Lydia B. Chilton,et al.  TurKit: human computation algorithms on mechanical turk , 2010, UIST.