Increasing Video Accessibility for Visually Impaired Users with Human-in-the-Loop Machine Learning

Video accessibility is crucial for blind and visually impaired individuals for education, employment, and entertainment purposes. However, professional video descriptions are costly and time-consuming. Volunteer-created video descriptions could be a promising alternative, however, they can vary in quality and can be intimidating for novice describers. We developed a Human-in-the-Loop Machine Learning (HILML) approach to video description by automating video text generation and scene segmentation while allowing humans to edit the output. Our HILML system was significantly faster and easier to use for first-time video describers compared to a human-only control condition with no machine learning assistance. The quality of the video descriptions and understanding of the topic created by the HILML system compared to the human-only condition were rated as being significantly higher by blind and visually impaired users.

[1]  Hans-Hellmut Nagel,et al.  Algorithmic characterization of vehicle trajectories from image sequences by motion verbs , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Artëm Yankov,et al.  Sharkzor: Interactive Deep Learning for Image Triage, Sort and Summary , 2018, ArXiv.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Yu Suzuki,et al.  Filtering Method for Twitter Streaming Data Using Human-in-the-Loop Machine Learning , 2019, J. Inf. Process..

[5]  Hironobu Takagi,et al.  Providing synthesized audio description for online videos , 2009, Assets '09.

[6]  C. Vega,et al.  Convention on the Rights of Persons with Disabilities and Optional Protocol , 2013 .

[7]  Artëm Yankov,et al.  SHARKZOR: Human in the Loop ML for User-Defined Image Classification , 2018, IUI Companion.

[8]  Patrick J. Fox,et al.  Shake Table Test of Large-Scale Bridge Columns Supported on Rocking Shallow Foundations , 2015 .

[9]  Francis Ferraro,et al.  Visual Storytelling , 2016, NAACL.

[10]  Denis Laurendeau,et al.  Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss , 2009, Universal Access in the Information Society.

[11]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Gretchen A. Stevens,et al.  Magnitude, temporal trends, and projections of the global prevalence of blindness and distance and near vision impairment: a systematic review and meta-analysis. , 2017, The Lancet. Global health.

[13]  Yannick Prié,et al.  An adaptive videos enrichment system based on decision trees for people with sensory disabilities , 2011, W4A.

[14]  Zhiming Luo,et al.  Interactive deep learning method for segmenting moving objects , 2017, Pattern Recognit. Lett..

[15]  Vishwa Gupta,et al.  A computer-vision-assisted system for Videodescription scripting , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[16]  Mark O. Riedl,et al.  Event Representations for Automated Story Generation with Deep Neural Nets , 2017, AAAI.

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[19]  C. Michael Sperberg-McQueen,et al.  World Wide Web Consortium , 2009, Encyclopedia of Database Systems.

[20]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Hironobu Takagi,et al.  Are synthesized video descriptions acceptable? , 2010, ASSETS '10.

[22]  Melissa Roemmele,et al.  Writing Stories with Help from Recurrent Neural Networks , 2016, AAAI.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.