ReVision: automated classification, analysis and redesign of chart images

Poorly designed charts are prevalent in reports, magazines, books and on the Web. Most of these charts are only available as bitmap images; without access to the underlying data it is prohibitively difficult for viewers to create more effective visual representations. In response we present ReVision, a system that automatically redesigns visualizations to improve graphical perception. Given a bitmap image of a chart as input, ReVision applies computer vision and machine learning techniques to identify the chart type (e.g., pie chart, bar chart, scatterplot, etc.). It then extracts the graphical marks and infers the underlying data. Using a corpus of images drawn from the web, ReVision achieves image classification accuracy of 96% across ten chart categories. It also accurately extracts marks from 79% of bar charts and 62% of pie charts, and from these charts it successfully extracts data from 71% of bar charts and 64% of pie charts. ReVision then applies perceptually-based design principles to populate an interactive gallery of redesigned charts. With this interface, users can view alternative chart designs and retarget content to different visual styles.

[1]  Edward R. Tufte,et al.  The Visual Display of Quantitative Information , 1986 .

[2]  Pat Hanrahan,et al.  Polaris: a system for query, analysis and visualization of multi-dimensional relational databases , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[3]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[4]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[5]  Chew Lim Tan,et al.  A system for understanding imaged infographics and its applications , 2007, DocEng '07.

[6]  Jock D. Mackinlay,et al.  Automating the design of graphical presentations of relational information , 1986, TOGS.

[7]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[8]  Cynthia A. Brewer,et al.  ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[9]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[10]  Larry S. Davis,et al.  Classifying Computer Generated Charts , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[11]  Jeffrey Heer,et al.  Protovis: A Graphical Toolkit for Visualization , 2009, IEEE Transactions on Visualization and Computer Graphics.

[12]  Robert P. Futrelle,et al.  Recognition and Classification of Figures in PDF Documents , 2005, GREC.

[13]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[14]  Chew Lim Tan,et al.  Hough technique for bar charts detection and recognition in document images , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[15]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[16]  Pat Hanrahan,et al.  Show Me: Automatic Presentation for Visual Analysis , 2007, IEEE Transactions on Visualization and Computer Graphics.

[17]  David K. Simkin,et al.  An Information-Processing Analysis of Graph Perception , 1987 .

[18]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[19]  Chew Lim Tan,et al.  Semi-automatic Ground Truth Generation for Chart Image Recognition , 2006, Document Analysis Systems.

[20]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[21]  Jiebo Luo,et al.  Review of the State of the Art in Semantic Scene Classification , 2002 .

[22]  Edward Rolf Tufte,et al.  The visual display of quantitative information , 1985 .

[23]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[24]  Andrew Zisserman,et al.  Scene Classification Via pLSA , 2006, ECCV.

[25]  Chew Lim Tan,et al.  Extraction of Vectorized Graphical Information from Scientific Chart Images , 2007, Ninth International Conference on Document Analysis and Recognition (ICDAR 2007).

[26]  W. Cleveland,et al.  Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods , 1984 .

[27]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[28]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[29]  Maureen C. Stone,et al.  A field guide to digital color , 2003 .

[30]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[31]  Andrew W. Fitzgibbon,et al.  Direct Least Square Fitting of Ellipses , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Chew Lim Tan,et al.  Model-Based Chart Image Recognition , 2003, GREC.