Beagle: Automated Extraction and Interpretation of Visualizations from the Web

"How common is interactive visualization on the web?" "What is the most popular visualization design?" "How prevalent are pie charts really?" These questions intimate the role of interactive visualization in the real (online) world. In this paper, we present our approach (and findings) to answering these questions. First, we introduce Beagle, which mines the web for SVG-based visualizations and automatically classifies them by type (i.e., bar, pie, etc.). With Beagle, we extract over 41,000 visualizations across five different tools and repositories, and classify them with 85% accuracy, across 24 visualization types. Given this visualization collection, we study usage across tools. We find that most visualizations fall under four types: bar charts, line charts, scatter charts, and geographic maps. Though controversial, pie charts are relatively rare for the visualization tools that were studied. Our findings also suggest that the total visualization types supported by a given tool could factor into its ease of use. However this effect appears to be mitigated by providing a variety of diverse expert visualization examples to users.

[1]  Jeffrey Heer,et al.  Reverse‐Engineering Visualizations: Recovering Visual Encodings from Chart Images , 2017, Comput. Graph. Forum.

[2]  Bongshin Lee,et al.  ChartSense: Interactive Data Extraction from Chart Images , 2017, CHI.

[3]  Ali Farhadi,et al.  FigureSeer: Parsing Result-Figures in Research Papers , 2016, ECCV.

[4]  Maneesh Agrawala,et al.  Converting Basic D3 Charts into Reusable Style Templates , 2016, IEEE Transactions on Visualization and Computer Graphics.

[5]  Babak Saleh,et al.  Learning style similarity for searching infographics , 2015, Graphics Interface.

[6]  Maneesh Agrawala,et al.  Deconstructing and restyling D3 visualizations , 2014, UIST.

[7]  David R. Karger,et al.  End-users publishing structured information on the web: an observational study of what, why, and how , 2014, CHI.

[8]  Ranjitha Kumar,et al.  Webzeitgeist: design mining the web , 2013, CHI.

[9]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[10]  Jeffrey Heer,et al.  ReVision: automated classification, analysis and redesign of chart images , 2011, UIST.

[11]  Scott R. Klemmer,et al.  d.tour: style-based exploration of design example galleries , 2011, UIST.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[14]  Chew Lim Tan,et al.  A system for understanding imaged infographics and its applications , 2007, DocEng '07.

[15]  Larry S. Davis,et al.  Classifying Computer Generated Charts , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[16]  Robert P. Futrelle,et al.  Recognition and Classification of Figures in PDF Documents , 2005, GREC.

[17]  Thomas E. Obremski,et al.  The Visual Display of Quantitative Information , 1984 .

[18]  N. Cambridge Paper , 1977 .

[19]  R. Edwards,et al.  Charts , 1965, Justinian's Digest 9.2.51 in the Western Legal Canon.

[20]  A. Veglis Tableau Software , 2020, Encyclopedia of Big Data.

[21]  Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing , 2017, International Conference on Content-Based Multimedia Indexing.

[22]  R. Mendes Popular Blocks - bl.ocks.org , 2016 .

[23]  Jeffrey Heer,et al.  D³ Data-Driven Documents , 2011, IEEE Transactions on Visualization and Computer Graphics.