The Efficacy of Collaborative Authoring of Video Scene Descriptions

The majority of online video contents remain inaccessible to people with visual impairments due to the lack of audio descriptions to depict the video scenes. Content creators have traditionally relied on professionals to author audio descriptions, but their service is costly and not readily-available. We investigate the feasibility of creating more cost-effective audio descriptions that are also of high quality by involving novices. Specifically, we designed, developed, and evaluated ViScene, a web-based collaborative audio description authoring tool that enables a sighted novice author and a reviewer either sighted or blind to interact and contribute to scene descriptions (SDs)—text that can be transformed into audio through text-to-speech. Through a mixed-design study with N = 60 participants, we assessed the quality of SDs created by sighted novices with feedback from both sighted and blind reviewers. Our results showed that with ViScene novices could produce content that is Descriptive, Objective, Referable, and Clear at a cost of i.e., US$2.81pvm to US$5.48pvm, which is 54% to 96% lower than the professional service. However, the descriptions lacked in other quality dimensions (e.g., learning, a measure of how well an SD conveys the video’s intended message). While professional audio describers remain the gold standard, for content creators who cannot afford it, ViScene offers a cost-effective alternative, ultimately leading to a more accessible medium.

[1]  Meredith Ringel Morris,et al.  Rich Representations of Visual Content for Screen Reader Users , 2018, CHI.

[2]  Pooyan Fazli,et al.  Human-in-the-Loop Machine Learning to Increase Video Accessibility for Visually Impaired and Blind Users , 2020, Conference on Designing Interactive Systems.

[3]  R. Bennett What a Difference a Year Makes , 2006, Care Management Journals.

[4]  A. Kluger,et al.  The effects of feedback interventions on performance: A historical review, a meta-analysis, and a preliminary feedback intervention theory. , 1996 .

[5]  Vishwa Gupta,et al.  A computer-vision-assisted system for Videodescription scripting , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[6]  Gregg C. Vanderheiden,et al.  Web Content Accessibility Guidelines (WCAG) 2.0 , 2008 .

[7]  Kyungjun Lee,et al.  Pedestrian Detection with Wearable Cameras for the Blind: A Two-way Perspective , 2020, CHI.

[8]  Hironobu Takagi,et al.  Providing synthesized audio description for online videos , 2009, Assets '09.

[9]  Hironobu Takagi,et al.  Are synthesized video descriptions acceptable? , 2010, ASSETS '10.

[10]  Hironobu Takagi,et al.  Insights on Assistive Orientation and Mobility of People with Visual Impairment Based on Large-Scale Longitudinal Data , 2018, ACM Trans. Access. Comput..

[11]  J. Lakritz,et al.  The Semi-Automatic Generation of Audio Description from Screenplays , 2006 .

[12]  Louise Fryer,et al.  An Introduction to Audio Description: A practical guide , 2016 .

[13]  Larry Ambrose,et al.  The power of feedback. , 2002, Healthcare executive.

[14]  Sabine Braun,et al.  Creating Coherence in Audio Description , 2012 .

[15]  L. Fryer,et al.  Vocal delivery of audio description by genre: measuring users’ presence , 2018 .

[16]  Shaun K. Kane,et al.  Collaborative Accessibility: How Blind and Sighted Companions Co-Create Accessible Home Spaces , 2015, CHI.

[17]  Jeffrey P. Bigham,et al.  Rescribe: Authoring and Automatically Editing Audio Descriptions , 2020, UIST.

[18]  Deborah I. Fels,et al.  LiveDescribe: Can Amateur Describers Create High-Quality Audio Description? , 2012 .

[19]  Dingzeyu Li,et al.  Toward Automatic Audio Description Generation for Accessible Videos , 2021, CHI.

[20]  John M. Slatin,et al.  The art of ALT: toward a more accessible Web , 2001 .

[21]  Fabricio E. Balcazar,et al.  A Critical, Objective Review of Performance Feedback , 1985 .

[22]  Jaclyn Packer,et al.  An Overview of Video Description: History, Benefits, and Guidelines , 2015 .

[23]  Hernisa Kacorri,et al.  ViScene: A Collaborative Authoring Tool for Scene Descriptions in Videos , 2020, ASSETS.

[24]  Shaun K. Kane,et al.  The Invisible Work of Accessibility: How Blind Employees Manage Accessibility in Mixed-Ability Workplaces , 2015, ASSETS.

[25]  Hoi Ching Dawning Leung Audio description of, audiovisual programmes for, the visually impaired in Hong Kong , 2018 .

[26]  Michael S. Bernstein,et al.  PeerStudio: Rapid Peer Feedback Emphasizes Revision and Improves Performance , 2015, L@S.

[27]  Luiz M. G. Gonçalves,et al.  CineAD: a system for automated audio description script generation for the visually impaired , 2018, Universal Access in the Information Society.

[28]  Pooyan Fazli,et al.  Increasing Video Accessibility for Visually Impaired Users with Human-in-the-Loop Machine Learning , 2020, CHI Extended Abstracts.