Improving the Accessibility of Scientific Documents: Current State, User Needs, and a System Solution to Enhance Scientific PDF Accessibility for Blind and Low Vision Users

The majority of scientific papers are distributed in PDF, which pose challenges for accessibility, especially for blind and low vision (BLV) readers. We characterize the scope of this problem by assessing the accessibility of 11,397 PDFs published 2010--2019 sampled across various fields of study, finding that only 2.4% of these PDFs satisfy all of our defined accessibility criteria. We introduce the SciA11y system to offset some of the issues around inaccessibility. SciA11y incorporates several machine learning models to extract the content of scientific PDFs and render this content as accessible HTML, with added novel navigational features to support screen reader users. An intrinsic evaluation of extraction quality indicates that the majority of HTML renders (87%) produced by our system have no or only some readability issues. We perform a qualitative user study to understand the needs of BLV researchers when reading papers, and to assess whether the SciA11y system could address these needs. We summarize our user study findings into a set of five design recommendations for accessible scientific reader systems. User response to SciA11y was positive, with all users saying they would be likely to use the system in the future, and some stating that the system, if available, would become their primary workflow. We successfully produce HTML renders for over 12M papers, of which an open access subset of 1.5M are available for browsing at https://scia11y.org/.

[1]  Naheda Sahtout How science should support researchers with visual impairments. , 2020, Nature.

[2]  Eunyee Koh,et al.  A Formative Study on Designing Accurate and Natural Figure Captioning Systems , 2020, CHI Extended Abstracts.

[3]  Jeffrey P. Bigham,et al.  Twitter A11y: A Browser Extension to Make Twitter Images Accessible , 2020, CHI.

[4]  Center for Disease Control and Prevention Classification System for HIV-Infected Adults and Adolescents Clinical Classification , 2020, Definitions.

[5]  Daniel S. Weld,et al.  S2ORC: The Semantic Scholar Open Research Corpus , 2020, ACL.

[6]  Alireza Darvishy,et al.  PDF Accessibility of Research Papers: What Tools are Needed for Assessment and Remediation? , 2020, HICSS.

[7]  Yuxiao Dong,et al.  A Review of Microsoft Academic Services for Science of Science Studies , 2019, Front. Big Data.

[8]  Razvan C. Bunescu,et al.  Neural caption generation over figures , 2019, UbiComp/ISWC Adjunct.

[9]  Mireia Ribera,et al.  Publishing accessible proceedings: the DSAI 2016 case study , 2019, Universal Access in the Information Society.

[10]  Gerhard Weber,et al.  SVGPlott: an accessible tool to generate highly adaptable, accessible audio-tactile charts for and from blind and visually impaired people , 2019, PETRA.

[11]  Paul N. Bennett,et al.  Guidelines for Human-AI Interaction , 2019, CHI.

[12]  Hye-Young Paik,et al.  TEXUS: A unified framework for extracting and understanding tables in PDF documents , 2019, Inf. Process. Manag..

[13]  Dragan Ahmetovic,et al.  Axessibility: a LaTeX Package for Mathematical Formulae Accessibility in PDF Documents , 2018, ASSETS.

[14]  Kathleen F. McCoy,et al.  Multimodal Deep Learning using Images and Text for Information Graphic Classification , 2018, ASSETS.

[15]  Alireza Darvishy PDF Accessibility: Tools and Challenges , 2018, ICCHP.

[16]  Hao Ma,et al.  A Web-scale system for scientific knowledge exploration , 2018, ACL.

[17]  Doug Downey,et al.  Construction of the Literature Graph in Semantic Scholar , 2018, NAACL.

[18]  Waleed Ammar,et al.  Extracting Scientific Figures with Distantly Supervised Neural Networks , 2018, JCDL.

[19]  K. S. Kuppusamy,et al.  Accessible images (AIMS): a model to build self-describing images for assisting screen reader users , 2017, Universal Access in the Information Society.

[20]  Dominik Spinczyk,et al.  Multimedia platform for mathematics’ interactive learning accessible to blind people , 2018, Multimedia Tools and Applications.

[21]  Philippe A. Palanque,et al.  Making the field of computing more inclusive , 2017, Commun. ACM.

[22]  Gerhard Weber,et al.  Towards Accessible Charts for Blind and Partially Sighted People , 2017, Mensch & Computer.

[23]  Fabio Vitali,et al.  Towards accessible graphs in HTML-based scientific articles , 2017, 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[24]  David A. Shamma,et al.  An Uninteresting Tour Through Why Our Research Papers Aren't Accessible , 2016, CHI Extended Abstracts.

[25]  Julius T. Nganji,et al.  The Portable Document Format (PDF) accessibility practice of four journal publishers , 2015 .

[26]  Miao Fan,et al.  Table Region Detection on Large-scale PDF Files without Labeled Data , 2015, ArXiv.

[27]  Jeffrey P. Bigham,et al.  Creating accessible PDFs for conference proceedings , 2015, W4A.

[28]  Laurent Romary,et al.  GROBID - Information Extraction from Scientific Publications , 2015, ERCIM News.

[29]  Jeffrey P. Bigham Making the web easier to see with opportunistic accessibility improvement , 2014, UIST.

[30]  David A. McMeekin,et al.  Practical Segmentation Methods for Logical and Geometric Layout Analysis to Improve Scanned PDF Accessibility to Vision Impaired , 2014 .

[31]  Terrill Thompson Improving the user interface for people with disabilities , 2014, CHI Extended Abstracts.

[32]  Volker Sorge,et al.  Towards making mathematics a first class citizen in general screen readers , 2014, W4A.

[33]  Rob Edlin-White,et al.  User Control in Adaptive User Interfaces for Accessibility , 2013, INTERACT.

[34]  Matthias Peissner,et al.  MyUI: generating accessible user interfaces from multimodal design patterns , 2012, EICS '12.

[35]  Krzysztof Z. Gajos,et al.  Ability-Based Design: Concept, Principles and Examples , 2011, TACC.

[36]  Enda Bates,et al.  Spoken Mathematics Using Prosody, Earcons and Spearcons , 2010, ICCHP.

[37]  María Andrade-Aréchiga,et al.  MathML to ASCII-Braille and Hierarchical Tree Converter , 2010, ICCHP.

[38]  Peng Wu,et al.  Accessible bar charts for visually impaired users , 2008 .

[39]  Gregg C. Vanderheiden,et al.  Web Content Accessibility Guidelines (WCAG) 2.0 , 2008 .

[40]  Marta Díaz Boladeras,et al.  Estudio de la accesibilidad de los documentos científicos en soporte digital , 2008 .

[41]  José M. Bioucas-Dias,et al.  Vertex component analysis: a fast algorithm to unmix hyperspectral data , 2005, IEEE Transactions on Geoscience and Remote Sensing.

[42]  Diamantino Freitas,et al.  Enhancing the Accessibility of Mathematics for Blind People: The AudioMath Project , 2004, ICCHP.

[43]  M. Maxwell Skimming and Scanning Improvement: The Needs, Assumptions and Knowledge Base , 1972 .

[44]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .