An Automated Approach for Interpretation of Statistical Graphics

Text plays vital role in the analysis of quantitative data as in statistics the data representation is made through different graphical tools such as bar charts, pie charts, line charts, scatter diagram, histograms etc. Statistical graphics are the valuable tool used for visual information representation in multimodal documents. It is often observed that communicative goal of the statistical graphics is not captured by documents accompanying text. To perceive the represented information using statistical graphics is hard-hitting job for novice readers. An approach to automate the process of image classification and information extraction is presented in this paper. This study focuses on the area charts that are important type of statistical graphics used for probability distribution and testing of hypothesis process. Firstly, we have classified the area charts into different classes and then designed architecture for chart image classification and information withdrawal from each class of area chart. The extracted information is represented in the form of natural language summaries using template based approach.

[1]  Chew Lim Tan,et al.  Learning-based scientific chart recognition , 2001 .

[2]  Chew Lim Tan,et al.  Chart Image Classification Using Multiple-Instance Learning , 2007, 2007 IEEE Workshop on Applications of Computer Vision (WACV '07).

[3]  Jean-Marc Odobez,et al.  Text detection, recognition in images and video frames , 2004, Pattern Recognit..

[4]  Larry S. Davis,et al.  Classifying Computer Generated Charts , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[5]  Chew Lim Tan,et al.  Hough technique for bar charts detection and recognition in document images , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[6]  Jong-Hyun Park,et al.  Binarization of Text Region based on Fuzzy Clustering and Histogram Distribution in Signboards , 2008 .

[7]  W. Bieniecki,et al.  Image Preprocessing for Improving OCR Accuracy , 2007, 2007 International Conference on Perspective Technologies and Methods in MEMS Design.

[8]  Anil K. Jain,et al.  Text information extraction in images and video: a survey , 2004, Pattern Recognit..

[9]  Verónica Vilaplana,et al.  Caption text extraction for indexing purposes using a hierarchical region-based image model , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).