Recognition and quality assessment of data charts in mixed-mode documents

Data charts can be used to effectively compress large amounts of complex information and can convey information in an efficient and succinct manner. It is now easier to create data charts by using a variety of automated software systems. These data charts are routinely inserted in text documents and are widely disseminated over many different media. This study addresses the problem of finding goodness of data charts in mixed-mode documents. The quality of the graphics can be used to assist the document development process as well as to serve as an additional criterion for search engines like Google and Yahoo. The quality measures are motivated by principles of visual learning and are based on research in educational psychology and cognitive theories and use attributes of both the graphic and its textual context. We have implemented the approach and evaluated its effectiveness using a set of documents compiled from the Web. Results of a human study shows that the proposed quality measures have a high correlation with the quality ratings of the users for each of the five classes of data charts studied in this research.

[1]  Prasanna G. Mulgaonkar,et al.  Verification-Based Approach for Automated Text and Feature Extraction from Raster-Scanned Maps , 1995, GREC.

[2]  Azriel Rosenfeld,et al.  Computer Vision , 1988, Adv. Comput..

[3]  Paul E. Black,et al.  Dictionary of Algorithms and Data Structures | NIST , 1998 .

[4]  Robert P. Futrelle,et al.  Constraint based vectorization , 1989, ICS '89.

[5]  Karl Tombre,et al.  Graphics Recognition Methods and Applications , 1995, Lecture Notes in Computer Science.

[6]  H. Margolis Visual explanations: Images and quantities, evidence and narrative , 1998 .

[7]  R. Arnheim Entropy and Art: An Essay on Disorder and Order , 1971 .

[8]  George Nagy,et al.  Prototype Extraction and Adaptive OCR , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Luyang Li,et al.  Integrated text and line-art extraction from a topographic map , 2000, International Journal on Document Analysis and Recognition.

[10]  Václav Hlaváč,et al.  Image Processing, Analysis & and Machine Vision - A MATLAB Companion , 2007 .

[11]  B. Marx The Visual Display of Quantitative Information , 1985 .

[12]  Robert P. Futrelle,et al.  INFORMATIONAL DIAGRAMS IN SCIENTIFIC DOCUMENTS , 1992 .

[13]  Edward R. Tufte,et al.  Envisioning Information , 1990 .

[14]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[16]  Chew Lim Tan,et al.  Chart analysis and recognition in document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[17]  Raymond W. Smith,et al.  Computer processing of line images: A survey , 1987, Pattern Recognit..

[18]  Steven F. Roth,et al.  Data characterization for intelligent graphics presentation , 1990, CHI '90.

[19]  Robert P. Futrelle Ambiguity in visual language theory and its role in diagram parsing , 1999, Proceedings 1999 IEEE Symposium on Visual Languages.

[20]  Robert P. Futrelle The conversion of diagrams to knowledge bases , 1992, Proceedings IEEE Workshop on Visual Languages.

[21]  Noshir A. Langrana,et al.  Engineering Drawing Processing and Vectorization System , 1990, Comput. Vis. Graph. Image Process..

[22]  Toru Kaneko Line structure extraction from line-drawing images , 1992, Pattern Recognit..

[23]  Milan Sonka,et al.  Image Processing, Analysis and Machine Vision , 1993, Springer US.

[24]  Jose L. Navalón,et al.  A thinning algorithm based on contours , 1987 .

[25]  John J. Bertin,et al.  The semiology of graphics , 1983 .

[26]  Joan H. Coll,et al.  Graphs and tables: a four-factor experiment , 1994, CACM.

[27]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[28]  Thomas Strothotte,et al.  Seeing between the pixels - pictures in interactive systems , 1997 .

[29]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[30]  Rangachar Kasturi,et al.  Machine vision , 1995 .

[31]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[32]  Alfred Bork,et al.  Multimedia in Learning , 2001 .

[33]  Chew Lim Tan,et al.  Bar Charts Recognition Using Hough Based Syntactic Segmentation , 2000, Diagrams.

[34]  S. Shimotsuji A robust drawing recognition system based on contour shape analysis , 1990, [1990] Proceedings. 10th International Conference on Pattern Recognition.

[35]  Satoshi Suzuki Graph-based vectorization method for line patterns , 1988, Proceedings CVPR '88: The Computer Society Conference on Computer Vision and Pattern Recognition.

[36]  George Nagy,et al.  Automatic prototype extraction for adaptive OCR , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[37]  Linda G. Shapiro,et al.  Computer Vision , 2001 .

[38]  David S. Doermann An Introduction to Vectorization and Segmentation , 1997, GREC.

[39]  Marian Petre,et al.  Why looking isn't always seeing: readership skills and graphical programming , 1995, CACM.

[40]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[41]  Lotfi A. Zadeh,et al.  Fuzzy logic, neural networks, and soft computing , 1993, CACM.

[42]  R. Mayer,et al.  Multimedia Learning: The Promise of Multimedia Learning , 2001 .

[43]  Scott E. Umbaugh,et al.  Computer Imaging: Digital Image Analysis and Processing , 2005 .

[44]  Stephan Lewandowsky,et al.  The Perception of Statistical Graphs , 1989 .