ChartReader: Automatic Parsing of Bar-Plots

Scientific figures such as bar graphs are a critical part of scientific research and a predominant method used to represent trends and relationships in data. However, manually interpreting and extracting information from graphs is often tedious. Since data consumption has exponentially evolved over the past few decades, there is a need for automated data inference from these bar graphs. ChartReader presents a fully automated end-to-end framework that extracts data from bar graphs in scientific research papers focusing on process engineering and environmental science journals. ChartReader uses a deep learning-based classifier to determine the chart type of a given chart image. We then develop novel heuristic methods for analyzing scientific figures (text detection, pixel grouping, object detection) and address prime challenges like axis detection, legend parsing, and label detection. Our framework achieves 98% and 68% accuracy in parsing x-axis and y-axis ticks, respectively. It achieves 83% accuracy in parsing legends and 42% accuracy in parsing data values in the testing corpus. We compare the proposed method with state-of-the-art methods and address its limitations.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Christopher Andreas Clark,et al.  PDFFigures 2.0: Mining figures from research papers , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[3]  Venu Govindaraju,et al.  Chart Mining: A Survey of Methods for Automated Chart Analysis , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Venu Govindaraju,et al.  ICDAR 2019 Competition on Harvesting Raw Tables from Infographics (CHART-Infographics) , 2019, 2019 International Conference on Document Analysis and Recognition (ICDAR).

[5]  Bongshin Lee,et al.  ChartSense: Interactive Data Extraction from Chart Images , 2017, CHI.

[6]  Jeffrey Heer,et al.  ReVision: automated classification, analysis and redesign of chart images , 2011, UIST.

[7]  Waleed Ammar,et al.  Extracting Scientific Figures with Distantly Supervised Neural Networks , 2018, JCDL.

[8]  Abhijit Balaji,et al.  Chart-Text: A Fully Automated Chart Image Descriptor , 2018, ArXiv.

[9]  Junyu Luo,et al.  ChartOCR: Data Extraction from Charts Images via a Deep Hybrid Framework , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Jing Peng,et al.  Bar charts detection and analysis in biomedical literature of PubMed Central , 2017, AMIA.

[11]  Bianchi Serique Meiguins,et al.  A Real-World Approach on the Problem of Chart Recognition Using Classification, Detection and Perspective Correction , 2020, Sensors.

[12]  C. Lee Giles,et al.  Automatic Extraction of Data from Bar Charts , 2015, K-CAP.

[13]  Chew Lim Tan,et al.  Chart analysis and recognition in document images , 2001, Proceedings of Sixth International Conference on Document Analysis and Recognition.

[14]  Chew Lim Tan,et al.  Hough technique for bar charts detection and recognition in document images , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[15]  Toyohide Watanabe,et al.  Layout-Based Approach for Extracting Constructive Elements of Bar-Charts , 1997, GREC.