Document Image Classification: Towards Assisting Visually Impaired

Our work aims to enable the visual information in a document to be accessible by the visually impaired or the blind people. The blind people prefer arts over science subjects in higher education because conveying equations and algorithms to them is seen as difficult. They should not be deprived of acquiring knowledge due to physical disabilities. In this work, we identify different types of images in an algorithm textbook as graphs, algorithm images, equations, and network flow diagrams with 98% accuracy. The Convolution Neural Networks used can predict new data instances with 99% accuracy except for recurrence tree images. We want to extend our work to extract the textual information in these images and produce image descriptions as alt text. This can produce a better speech output to assist the blind people.

[1]  Jeffrey Heer,et al.  ReVision: automated classification, analysis and redesign of chart images , 2011, UIST.

[2]  Ronald L. Rivest,et al.  Introduction to Algorithms, third edition , 2009 .

[3]  Jun Rekimoto,et al.  Prediction of importance of figures in scholarly papers , 2017, 2017 Twelfth International Conference on Digital Information Management (ICDIM).

[4]  Michael G. Strintzis,et al.  Haptic Rendering of Visual Data for the Visually Impaired , 2007, IEEE MultiMedia.

[5]  C. Lee Giles,et al.  Automatic Extraction of Data from Bar Charts , 2015, K-CAP.

[6]  Muhammad Abrar,et al.  E-Business Access for Blinds: A Semantic Approach , 2009, 2009 International Conference on E-Business and Information System Security.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Fabio Vitali,et al.  Towards accessible graphs in HTML-based scientific articles , 2017, 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[9]  Aaron Allen,et al.  What Frustrates Screen Reader Users on the Web: A Study of 100 Blind Users , 2007, Int. J. Hum. Comput. Interact..

[10]  Christopher Andreas Clark,et al.  Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers , 2015, AAAI Workshop: Scholarly Big Data.

[11]  Abhijit Balaji,et al.  Chart-Text: A Fully Automated Chart Image Descriptor , 2018, ArXiv.

[12]  Ali Farhadi,et al.  FigureSeer: Parsing Result-Figures in Research Papers , 2016, ECCV.

[13]  Yoshua Bengio,et al.  FigureQA: An Annotated Figure Dataset for Visual Reasoning , 2017, ICLR.

[14]  Stephen A. Brewster,et al.  Evaluation of multimodal graphs for blind people , 2003, Universal Access in the Information Society.

[15]  Larry S. Davis,et al.  Classifying Computer Generated Charts , 2007, 2007 International Workshop on Content-Based Multimedia Indexing.

[16]  Christopher Andreas Clark,et al.  PDFFigures 2.0: Mining figures from research papers , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).